Pandas: Multiple Rolling Periods

February 17, 2024 Post a Comment

I would like to get multiple rolling period means and std for several columns simultaneously. This is the code I am using for rolling(5): def add_mean_std_cols(df): res = df.ro

Solution 1:

I would suggest creating a DataFrame with a MultiIndex as its columns. There's no way around using a loop here to iterate over your windows. The resulting form will be something that's easy to index and easy to read with pd.read_csv. Initialize an empty DataFrame with np.empty of the appropriate shape and use .loc to assign its values.

import numpy as np
import pandas as pd
np.random.seed(123)

df = pd.DataFrame(np.random.randn(100,3)).add_prefix('col')

windows = [5, 15, 30, 45]
stats = ['mean', 'std']
cols = pd.MultiIndex.from_product([windows, df.columns, stats], 
                                  names=['window', 'feature', 'metric'])

df2 = pd.DataFrame(np.empty((df.shape[0], len(cols))), columns=cols,
                   index=df.index)

forwindowinwindows:
    df2.loc[:, window] = df.rolling(window=window).agg(stats).values

Now you have a result df2 that has the same index as your original object. It has 3 column levels: the first is the window, the second is the columns from your original frame, and the third is the statistic.

print(df2.shape)
(100, 24)

This makes it easy to check values for a specific rolling window:

print(df2[5])  # Rolling window = 5
feature     col0              col1              col2         
metric      mean      std     mean      std     mean      std
0            NaN      NaN      NaN      NaN      NaN      NaN
1            NaN      NaN      NaN      NaN      NaN      NaN
2            NaN      NaN      NaN      NaN      NaN      NaN
3            NaN      NaN      NaN      NaN      NaN      NaN
4       -0.878791.45348 -0.265590.712360.532330.89430
..           ...      ...      ...      ...      ...      ...
95      -0.442311.02552 -1.221380.45140 -0.364400.9532496      -0.586381.10246 -0.901650.79723 -0.445431.0016697      -0.705640.85711 -0.426441.07174 -0.447661.0028498      -0.957021.01302 -0.037051.050660.164371.3234199      -0.570261.109780.087301.024380.399301.31240print(df2[5]['col0'])  # Rolling window = 5, stats of col0 only
metric     mean      std
0           NaN      NaN
1           NaN      NaN
2           NaN      NaN
3           NaN      NaN
4      -0.878791.45348
..          ...      ...
95     -0.442311.0255296     -0.586381.1024697     -0.705640.8571198     -0.957021.0130299     -0.570261.10978print(df2.loc[:, (5, slice(None), 'mean')]) # Rolling window = 5,# means of each column
period         5                  
feature     col0     col1     col2
metric      mean     mean     mean
0            NaN      NaN      NaN
1            NaN      NaN      NaN
2            NaN      NaN      NaN
3            NaN      NaN      NaN
4       -0.87879 -0.265590.53233
..           ...      ...      ...
95      -0.44231 -1.22138 -0.3644096      -0.58638 -0.90165 -0.4454397      -0.70564 -0.42644 -0.4476698      -0.95702 -0.037050.1643799      -0.570260.087300.39930

And lastly to make a single-indexed DataFrame, here's some kludgy use of itertools.

df = pd.DataFrame(np.random.randn(100,3)).add_prefix('col')

import itertools

means = [col + '_mean' forcolin df.columns]
stds = [col + '_std' forcolin df.columns]
iters = [iter(means), iter(stds)]
iters = list(it.__next__() foritin itertools.cycle(iters))
iters = list(itertools.product(iters, [str(win) forwinin windows]))
iters = ['_'.join(it) foritin iters]

df2 = [df.rolling(window=window).agg(stats).values forwindowin windows]
df2 = pd.DataFrame(np.concatenate(df2, axis=1), columns=iters,
                   index=df.index)

Solution 2:

You can concatenate output of multiple rolling aggregations:

windows = (5, 15, 30, 45)
rolling_dfs = (df.rolling(i)                                    # 1. Create window
                 .agg(['mean', 'std'])                          # 1. Aggregate
                 .rename_axis({col: '{0}_{1:d}'.format(col, i)
                               for col in df.columns}, axis=1)  # 2. Rename columnsfor i in windows)                                # For each window

pd.concat((df, *rolling_dfs), axis=1)                           # 3. Concatenate dataframes

This is not pretty but should do what you're looking for from what I understand.

What it does:

creates a generator rolling_dfs with the aggregated dataframes for each rolling window size.
renames all columns so you can know which rolling window size it refers to.
concatenates the original df with the rolling windows.

Free Interactive Python Tutorial

Pandas: Multiple Rolling Periods

Solution 1:

Solution 2:

Post a Comment for "Pandas: Multiple Rolling Periods"