Create Lag Features Based On Multiple Columns
i have a time series dataset. i need to extract the lag features. i am using below code but got all NAN's df.groupby(['week','id1','id2','id3'],as_index=False)['value'].shift(1) i
Solution 1:
You want to shift to the next week so remove 'week'
from the grouping:
df['t-1'] = df.groupby(['id1','id2','id3'],as_index=False)['value'].shift()
# week id1 id2 id3 value t-1#0 1 101 123 1 45 NaN#1 1 102 231 4 89 NaN#2 1 203 435 99 65 NaN#3 2 101 123 1 48 45.0#4 2 102 231 4 75 89.0#5 2 203 435 99 90 65.0
That's error prone to missing weeks. In this case we can merge after changing the week, which ensures it is the prior week regardless of missing weeks.
df2 = df.assign(week=df.week+1).rename(columns={'value': 't-1'})
df = df.merge(df2, on=['week', 'id1', 'id2', 'id3'], how='left')
Another way to bring and rename many columns would be to use the suffixes
argument in the merge. This will rename all overlapping columns (that are not keys) in the right DataFrame.
df.merge(df.assign(week=df.week+1), # Manally lagon=['week', 'id1', 'id2', 'id3'],
how='left',
suffixes=['', '_lagged'] # Right df columns -> _lagged
)
# week id1 id2 id3 value value_lagged#0 1 101 123 1 45 NaN#1 1 102 231 4 89 NaN#2 1 203 435 99 65 NaN#3 2 101 123 1 48 45.0#4 2 102 231 4 75 89.0#5 2 203 435 99 90 65.0
Post a Comment for "Create Lag Features Based On Multiple Columns"