Skip to content Skip to sidebar Skip to footer

Fastest Way To Eliminate Specific Dates From Pandas Dataframe

I'm working with a large data frame and I'm struggling to find an efficient way to eliminate specific dates. Note that I'm trying to eliminate any measurements from a specific date

Solution 1:

You can create a boolean mask using a list comprehension.

>>>df[[d.date()notinpd.to_datetime(removelist)fordindf.index]]values2016-04-21 15:03:49  28.0595202016-04-23 08:13:42 -22.3765772016-04-23 11:23:41  40.3502522016-04-23 14:08:41  14.5578562016-04-25 06:48:33  -0.2719762016-04-25 21:48:31  20.1562402016-04-26 13:58:28  -3.2257952016-04-27 01:58:26  51.9912932016-04-27 02:53:26  -0.8677532016-04-27 15:33:23  31.5852012016-04-27 18:08:23  11.6396412016-04-27 20:48:22  42.9681562016-04-27 21:18:22  27.3359952016-04-27 23:13:22  13.1200882016-04-28 12:08:20  53.730511

Solution 2:

Same idea as @Alexander, but using properties of the DatetimeIndex and numpy.in1d:

mask = ~np.in1d(df.index.date, pd.to_datetime(removelist).date)
df = df.loc[mask, :]

Timings:

%timeit df.loc[~np.in1d(df.index.date, pd.to_datetime(removelist).date), :]
1000 loops, best of 3: 1.42 ms per loop

%timeit df[[d.date() not in pd.to_datetime(removelist) for d in df.index]]100 loops, best of 3: 3.25 ms per loop

Post a Comment for "Fastest Way To Eliminate Specific Dates From Pandas Dataframe"