Conversion Of A Timedelta To Int Very Slow In Python
I have a dataframe with two columns, each one formed by a set of dates. I want to compute the difference between dates and return the the number of days. However, the process (desc
Solution 1:
You may find a marginal massive speed-up dropping down to NumPy, bypassing the overhead associated with pd.Series
objects.
See also pd.Timestamp versus np.datetime64: are they interchangeable for selected uses?.
# Python 3.6.0, Pandas 0.19.2, NumPy 1.11.3defdays_lambda(dfx):
return (dfx['y']-dfx['x']).apply(lambda x: x.days)
defdays_pd(dfx):
return (dfx['y']-dfx['x']).dt.days
defdays_np(dfx):
return (dfx['y'].values-dfx['x'].values) / np.timedelta64(1, 'D')
# check results are identicalassert (days_lambda(dfx).values == days_pd(dfx).values).all()
assert (days_lambda(dfx).values == days_np(dfx)).all()
dfx = pd.concat([dfx]*100000)
%timeit days_lambda(dfx) # 5.02 s per loop
%timeit days_pd(dfx) # 5.6 s per loop
%timeit days_np(dfx) # 4.72 ms per loop
Post a Comment for "Conversion Of A Timedelta To Int Very Slow In Python"