How to update a dataframe in Pandas Python -


i have following 2 dataframes in pandas:

df1: authorid1  authorid2  co-authored a1         a2         0 a1         a3         0 a1         a4         0 a2         a3         0  df2: authorid1  authorid2  co-authored a1         a2         5 a2         a3         6 a6         a7         9 

i (without looping , comparing) find matching authorid1 , authorid2 pairing in df2 exist in df1 , update column values accordingly. result above 2 tables following:

resulting updated df1: authorid1  authorid2  co-authored a1         a2         5 a1         a3         0 a1         a4         0 a2         a3         6 

is there fast way this? have 7 millions rows in df1 , looping , comparing take forever.

update: note last 2 in df2 should not part of update in df1 since doesn't exist in df1

you can use update:

df1.update(df2) print (df1)   authorid1 authorid2  co-authored 0        a1        a2          5.0 1        a2        a3          6.0 2        a1        a4          0.0 3        a2        a3          0.0 

sample:

df1 = pd.dataframe({'new': {0: 7, 1: 8, 2: 1, 3: 3},                      'authorid2': {0: 'a2', 1: 'a3', 2: 'a4', 3: 'a3'},                      'authorid1': {0: 'a1', 1: 'a1', 2: 'a1', 3: 'a2'},                      'co-authored': {0: 0, 1: 0, 2: 0, 3: 0}})  df2 = pd.dataframe({'authorid2': {0: 'a2', 1: 'a3'},                     'authorid1': {0: 'a1', 1: 'a2'},                      'co-authored': {0: 5, 1: 6}})    authorid1 authorid2  co-authored  new 0        a1        a2            0    7 1        a1        a3            0    8 2        a1        a4            0    1 3        a2        a3            0    3  print (df2)   authorid1 authorid2  co-authored 0        a1        a2            5 1        a2        a3            6  df1.update(df2) print (df1)   authorid1 authorid2  co-authored  new 0        a1        a2          5.0    7 1        a2        a3          6.0    8 2        a1        a4          0.0    1 3        a2        a3          0.0    3 

edit comment:

i think need filter df2 df1 firstly isin:

df2 = df2[df2[['authorid1','authorid2']].isin(df1[['authorid1','authorid2']]).any(1)] print (df2)   authorid1 authorid2  co-authored 0        a1        a2            5 1        a2        a3            6  df1.update(df2) print (df1)   authorid1 authorid2  co-authored 0        a1        a2          5.0 1        a2        a3          6.0 2        a1        a4          0.0 3        a2        a3          0.0 

Comments

Popular posts from this blog

Spring Boot + JPA + Hibernate: Unable to locate persister -

go - Golang: panic: runtime error: invalid memory address or nil pointer dereference using bufio.Scanner -

c - double free or corruption (fasttop) -