python - Remove cancelling rows from Pandas Dataframe -
i have list of invoices sent out customers. however, bad invoice sent, later cancelled. pandas dataframe looks this, except larger (~3 million rows)
index | customer | invoice_nr | amount | date --------------------------------------------------- 0 | 1 | 1 | 10 | 01-01-2016 1 | 1 | 1 | -10 | 01-01-2016 2 | 1 | 1 | 11 | 01-01-2016 3 | 1 | 2 | 10 | 02-01-2016 4 | 2 | 3 | 7 | 01-01-2016 5 | 2 | 4 | 12 | 02-01-2016 6 | 2 | 4 | 8 | 02-01-2016 7 | 2 | 4 | -12 | 02-01-2016 8 | 2 | 4 | 4 | 02-01-2016 ... | ... | ... | ... | ... ... | ... | ... | ... | ...
now, want drop rows customer
, invoice_nr
, date
identical, amount
has opposite values.
corrections of invoices take place on same day identical invoice number. invoice number uniquely bound customer , corresponds 1 transaction (which can consist of multiple components, example customer = 2
, invoice_nr = 4
). corrections of invoices occur either change amount
charged, or split amount
in smaller components. hence, cancelled value not repeated on same invoice_nr
.
any how program appreciated.
def remove_cancelled_transactions(df): trans_neg = df.amount < 0 return df.loc[~(trans_neg | trans_neg.shift(-1))] groups = [df.customer, df.invoice_nr, df.date, df.amount.abs()] df.groupby(groups, as_index=false, group_keys=false) \ .apply(remove_cancelled_transactions)
Comments
Post a Comment