Pandas.dataframe Slicing With Multiple Date Ranges
I have a datetime-indexed dataframe object with 100,000+ rows. I was wondering if there was a convenient way using pandas to get a subset of this dataframe that is within multiple
Solution 1:
There are two main ways to slice a DataFrame with a DatetimeIndex by date.
by slices:
df.loc[start:end]
. If there are multiple date ranges, the single slices may be concatenated withpd.concat
.by boolean selection mask:
df.loc[mask]
Using pd.concat and slices:
import numpy as np
import pandas as pd
np.random.seed(2016)
N = 10**2
df = pd.DataFrame(np.random.randint(10, size=(N, 2)),
index=pd.date_range('2016-6-27', periods=N, freq='45T'))
result = pd.concat([df.loc['2016-6-27':'2016-6-27 5:00'],
df.loc['2016-6-27 15:00':'2016-6-27 23:59:59']])
yields
012016-06-27 00:00:00 022016-06-27 00:45:00 552016-06-27 01:30:00 962016-06-27 02:15:00 842016-06-27 03:00:00 502016-06-27 03:45:00 482016-06-27 04:30:00 702016-06-27 15:00:00 252016-06-27 15:45:00 672016-06-27 16:30:00 682016-06-27 17:15:00 512016-06-27 18:00:00 292016-06-27 18:45:00 912016-06-27 19:30:00 972016-06-27 20:15:00 362016-06-27 21:00:00 352016-06-27 21:45:00 082016-06-27 22:30:00 562016-06-27 23:15:00 08
Note that unlike most slicing syntaxes used in Python,
df.loc['2016-6-27':'2016-6-27 5:00']
is inclusive on both ends -- the slice defines a closed interval, is not a half-open interval.
Using a boolean selection mask:
mask = (((df.index >= '2016-6-27') & (df.index <= '2016-6-27 5:00'))
| ((df.index >= '2016-6-27 15:00') & (df.index < '2016-6-28')))
result2 = df.loc[mask]
assert result.equals(result2)
Solution 2:
I feel the best option will be to use the direct checks rather than using loc function:
df = df[((df.index >= '2016-6-27') & (df.index <= '2016-6-27 5:00'))
| ((df.index >= '2016-6-27 15:00') & (df.index < '2016-6-28'))]
It works for me.
Major issue with loc function with a slice is that the limits should be present in the actual values, if not this will result in KeyError.
Post a Comment for "Pandas.dataframe Slicing With Multiple Date Ranges"