Skip to content Skip to sidebar Skip to footer

Create Lag Features Based On Multiple Columns

i have a time series dataset. i need to extract the lag features. i am using below code but got all NAN's df.groupby(['week','id1','id2','id3'],as_index=False)['value'].shift(1) i

Solution 1:

You want to shift to the next week so remove 'week' from the grouping:

df['t-1'] = df.groupby(['id1','id2','id3'],as_index=False)['value'].shift()
#    week  id1  id2  id3  value   t-1#0     1  101  123    1     45   NaN#1     1  102  231    4     89   NaN#2     1  203  435   99     65   NaN#3     2  101  123    1     48  45.0#4     2  102  231    4     75  89.0#5     2  203  435   99     90  65.0

That's error prone to missing weeks. In this case we can merge after changing the week, which ensures it is the prior week regardless of missing weeks.

df2 = df.assign(week=df.week+1).rename(columns={'value': 't-1'})
df = df.merge(df2, on=['week', 'id1', 'id2', 'id3'], how='left')

Another way to bring and rename many columns would be to use the suffixes argument in the merge. This will rename all overlapping columns (that are not keys) in the right DataFrame.

df.merge(df.assign(week=df.week+1),         # Manally lagon=['week', 'id1', 'id2', 'id3'], 
         how='left',
         suffixes=['', '_lagged']           # Right df columns -> _lagged
         )
#   week  id1  id2  id3  value  value_lagged#0     1  101  123    1     45           NaN#1     1  102  231    4     89           NaN#2     1  203  435   99     65           NaN#3     2  101  123    1     48          45.0#4     2  102  231    4     75          89.0#5     2  203  435   99     90          65.0

Post a Comment for "Create Lag Features Based On Multiple Columns"