python - Remove cancelling rows from Pandas Dataframe -


i have list of invoices sent out customers. however, bad invoice sent, later cancelled. pandas dataframe looks this, except larger (~3 million rows)

index | customer | invoice_nr | amount | date --------------------------------------------------- 0     | 1        | 1          | 10     | 01-01-2016 1     | 1        | 1          | -10    | 01-01-2016 2     | 1        | 1          | 11     | 01-01-2016 3     | 1        | 2          | 10     | 02-01-2016 4     | 2        | 3          | 7      | 01-01-2016 5     | 2        | 4          | 12     | 02-01-2016 6     | 2        | 4          | 8      | 02-01-2016 7     | 2        | 4          | -12    | 02-01-2016 8     | 2        | 4          | 4      | 02-01-2016 ...   | ...      | ...        | ...    | ... ...   | ...      | ...        | ...    | ... 

now, want drop rows customer, invoice_nr , date identical, amount has opposite values.
corrections of invoices take place on same day identical invoice number. invoice number uniquely bound customer , corresponds 1 transaction (which can consist of multiple components, example customer = 2, invoice_nr = 4). corrections of invoices occur either change amount charged, or split amount in smaller components. hence, cancelled value not repeated on same invoice_nr.

any how program appreciated.

def remove_cancelled_transactions(df):     trans_neg = df.amount < 0     return df.loc[~(trans_neg | trans_neg.shift(-1))]  groups = [df.customer, df.invoice_nr, df.date, df.amount.abs()] df.groupby(groups, as_index=false, group_keys=false) \   .apply(remove_cancelled_transactions) 

enter image description here


Comments

Popular posts from this blog

Spring Boot + JPA + Hibernate: Unable to locate persister -

go - Golang: panic: runtime error: invalid memory address or nil pointer dereference using bufio.Scanner -

c - double free or corruption (fasttop) -