python - Pandas dataframe generate column with different row info, but no apply function -
maybe question name not accurate (sorry because don't find accurate word describe question...), let me make example:
the following dataframe income "week_id" , "user_id":
week_id user income 1 1 100 1 2 50 2 1 200 2 2 30 2 3 150 3 1 100 3 2 150 ....
i want add new column, contains "income" of previous week, looks like:
week_id user income previous_week_income 1 1 100 0 1 2 50 0 2 1 200 100 2 2 30 50 2 3 150 0 3 1 100 200 3 2 150 30 ....
it looks generate new column information other rows, other current row.
i know solution apply function, it's row row, seems slow case ( origin dataframe may tens of millions of rows ), wonder other fast solution result?
the background generate factor predictive analysis, want use previous week income 1 variable when predict current week income.
thanks in advance :)
i think need dataframegroupby.shift
fillna
if each week_id
has unique users
:
df['previous_week_income'] = df.groupby('user')['income'].shift().fillna(0) print (df) week_id user income previous_week_income 0 1 1 100 0.0 1 1 2 50 0.0 2 2 1 200 100.0 3 2 2 30 50.0 4 2 3 150 0.0 5 3 1 100 200.0 6 3 2 150 30.0
Comments
Post a Comment