How to update a dataframe in Pandas Python -
i have following 2 dataframes in pandas:
df1: authorid1 authorid2 co-authored a1 a2 0 a1 a3 0 a1 a4 0 a2 a3 0 df2: authorid1 authorid2 co-authored a1 a2 5 a2 a3 6 a6 a7 9
i (without looping , comparing) find matching authorid1 , authorid2 pairing in df2 exist in df1 , update column values accordingly. result above 2 tables following:
resulting updated df1: authorid1 authorid2 co-authored a1 a2 5 a1 a3 0 a1 a4 0 a2 a3 6
is there fast way this? have 7 millions rows in df1 , looping , comparing take forever.
update: note last 2 in df2 should not part of update in df1 since doesn't exist in df1
you can use update
:
df1.update(df2) print (df1) authorid1 authorid2 co-authored 0 a1 a2 5.0 1 a2 a3 6.0 2 a1 a4 0.0 3 a2 a3 0.0
sample:
df1 = pd.dataframe({'new': {0: 7, 1: 8, 2: 1, 3: 3}, 'authorid2': {0: 'a2', 1: 'a3', 2: 'a4', 3: 'a3'}, 'authorid1': {0: 'a1', 1: 'a1', 2: 'a1', 3: 'a2'}, 'co-authored': {0: 0, 1: 0, 2: 0, 3: 0}}) df2 = pd.dataframe({'authorid2': {0: 'a2', 1: 'a3'}, 'authorid1': {0: 'a1', 1: 'a2'}, 'co-authored': {0: 5, 1: 6}}) authorid1 authorid2 co-authored new 0 a1 a2 0 7 1 a1 a3 0 8 2 a1 a4 0 1 3 a2 a3 0 3 print (df2) authorid1 authorid2 co-authored 0 a1 a2 5 1 a2 a3 6 df1.update(df2) print (df1) authorid1 authorid2 co-authored new 0 a1 a2 5.0 7 1 a2 a3 6.0 8 2 a1 a4 0.0 1 3 a2 a3 0.0 3
edit comment:
i think need filter df2
df1
firstly isin
:
df2 = df2[df2[['authorid1','authorid2']].isin(df1[['authorid1','authorid2']]).any(1)] print (df2) authorid1 authorid2 co-authored 0 a1 a2 5 1 a2 a3 6 df1.update(df2) print (df1) authorid1 authorid2 co-authored 0 a1 a2 5.0 1 a2 a3 6.0 2 a1 a4 0.0 3 a2 a3 0.0
Comments
Post a Comment