Pandas: Multiple Rolling Periods
Solution 1:
I would suggest creating a DataFrame with a MultiIndex as its columns. There's no way around using a loop here to iterate over your windows. The resulting form will be something that's easy to index and easy to read with pd.read_csv. Initialize an empty DataFrame with np.empty of the appropriate shape and use .loc to assign its values.
import numpy as np
import pandas as pd
np.random.seed(123)
df = pd.DataFrame(np.random.randn(100,3)).add_prefix('col')
windows = [5, 15, 30, 45]
stats = ['mean', 'std']
cols = pd.MultiIndex.from_product([windows, df.columns, stats],
names=['window', 'feature', 'metric'])
df2 = pd.DataFrame(np.empty((df.shape[0], len(cols))), columns=cols,
index=df.index)
forwindowinwindows:
df2.loc[:, window] = df.rolling(window=window).agg(stats).valuesNow you have a result df2 that has the same index as your original object. It has 3 column levels: the first is the window, the second is the columns from your original frame, and the third is the statistic.
print(df2.shape)
(100, 24)
This makes it easy to check values for a specific rolling window:
print(df2[5]) # Rolling window = 5
feature col0 col1 col2
metric mean std mean std mean std
0 NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN
4 -0.878791.45348 -0.265590.712360.532330.89430
.. ... ... ... ... ... ...
95 -0.442311.02552 -1.221380.45140 -0.364400.9532496 -0.586381.10246 -0.901650.79723 -0.445431.0016697 -0.705640.85711 -0.426441.07174 -0.447661.0028498 -0.957021.01302 -0.037051.050660.164371.3234199 -0.570261.109780.087301.024380.399301.31240print(df2[5]['col0']) # Rolling window = 5, stats of col0 only
metric mean std
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 -0.878791.45348
.. ... ...
95 -0.442311.0255296 -0.586381.1024697 -0.705640.8571198 -0.957021.0130299 -0.570261.10978print(df2.loc[:, (5, slice(None), 'mean')]) # Rolling window = 5,# means of each column
period 5
feature col0 col1 col2
metric mean mean mean
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 -0.87879 -0.265590.53233
.. ... ... ...
95 -0.44231 -1.22138 -0.3644096 -0.58638 -0.90165 -0.4454397 -0.70564 -0.42644 -0.4476698 -0.95702 -0.037050.1643799 -0.570260.087300.39930And lastly to make a single-indexed DataFrame, here's some kludgy use of itertools.
df = pd.DataFrame(np.random.randn(100,3)).add_prefix('col')
import itertools
means = [col + '_mean' forcolin df.columns]
stds = [col + '_std' forcolin df.columns]
iters = [iter(means), iter(stds)]
iters = list(it.__next__() foritin itertools.cycle(iters))
iters = list(itertools.product(iters, [str(win) forwinin windows]))
iters = ['_'.join(it) foritin iters]
df2 = [df.rolling(window=window).agg(stats).values forwindowin windows]
df2 = pd.DataFrame(np.concatenate(df2, axis=1), columns=iters,
index=df.index)
Solution 2:
You can concatenate output of multiple rolling aggregations:
windows = (5, 15, 30, 45)
rolling_dfs = (df.rolling(i) # 1. Create window
.agg(['mean', 'std']) # 1. Aggregate
.rename_axis({col: '{0}_{1:d}'.format(col, i)
for col in df.columns}, axis=1) # 2. Rename columnsfor i in windows) # For each window
pd.concat((df, *rolling_dfs), axis=1) # 3. Concatenate dataframesThis is not pretty but should do what you're looking for from what I understand.
What it does:
- creates a generator
rolling_dfswith the aggregated dataframes for each rolling window size. - renames all columns so you can know which rolling window size it refers to.
- concatenates the original
dfwith the rolling windows.
Post a Comment for "Pandas: Multiple Rolling Periods"