Skip to content Skip to sidebar Skip to footer

Conversion Of A Timedelta To Int Very Slow In Python

I have a dataframe with two columns, each one formed by a set of dates. I want to compute the difference between dates and return the the number of days. However, the process (desc

Solution 1:

You may find a marginal massive speed-up dropping down to NumPy, bypassing the overhead associated with pd.Series objects.

See also pd.Timestamp versus np.datetime64: are they interchangeable for selected uses?.

# Python 3.6.0, Pandas 0.19.2, NumPy 1.11.3defdays_lambda(dfx):
    return (dfx['y']-dfx['x']).apply(lambda x: x.days)

defdays_pd(dfx):
    return (dfx['y']-dfx['x']).dt.days

defdays_np(dfx):
    return (dfx['y'].values-dfx['x'].values) / np.timedelta64(1, 'D')

# check results are identicalassert (days_lambda(dfx).values == days_pd(dfx).values).all()
assert (days_lambda(dfx).values == days_np(dfx)).all()

dfx = pd.concat([dfx]*100000)

%timeit days_lambda(dfx)  # 5.02 s per loop
%timeit days_pd(dfx)      # 5.6 s per loop
%timeit days_np(dfx)      # 4.72 ms per loop

Post a Comment for "Conversion Of A Timedelta To Int Very Slow In Python"