Pandas Date_range - Subtracting Numpy Timedelta Gives Odd Result, Time Becomes Not 0:00:00
Solution 1:
I've had similar issues with timedelta
, and the solution I've ended up using was using relativedelta
from dateutil
, which is specifically built for this kind of application (taking into account all the calendar weirdness like leap years, weekdays, etc...). For example given:
from dateutil.relativedelta import relativedelta
date = dates[0]
>>> date
Timestamp('2013-01-01 00:00:00', freq='10MS')
deltaGap = relativedelta(months=1)
>>> date-deltaGap
Timestamp('2012-12-01 00:00:00', freq='10MS')
deltaGap = relativedelta(years=2, months=1)
>>> date-deltaGap
Timestamp('2010-12-01 00:00:00', freq='10MS')
Check out the documentation for more info on relativedelta
The issues with numpy.timedelta64
I think that the problem with np.timedelta
is revealed in these 2 parts of the docs:
There are two Timedelta units (‘Y’, years and ‘M’, months) which are treated specially, because how much time they represent changes depending on when they are used. While a timedelta day unit is equivalent to 24 hours, there is no way to convert a month unit into days, because different months have different numbers of days.
and
The length of the span is the range of a 64-bit integer times the length of the date or unit. For example, the time span for ‘W’ (week) is exactly 7 times longer than the time span for ‘D’ (day), and the time span for ‘D’ (day) is exactly 24 times longer than the time span for ‘h’ (hour).
So the timedeltas are fine for hours, weeks, months, days, because these are non-variable timespans. However, months and years are variable in length (think leap years), and so to take this into account, numpy
takes some sort of "average" (I guess). One numpy
"year" seems to be one year, 5 hours, 49 minutes and 12 seconds, while one numpy
"month" seems to be 30 days, 10 hours, 29 minutes and 6 seconds.
# Adding one numpy month adds 30 days + 10:29:06:
deltaGap = np.timedelta64(1,'M')
date+deltaGap
# Timestamp('2013-01-31 10:29:06', freq='10MS')
# Adding one numpy year adds 1 year + 05:49:12:
deltaGap = np.timedelta64(1,'Y')
date+deltaGap
# Timestamp('2014-01-01 05:49:12', freq='10MS')
This is not so easy to work with, which is why I would just go to relativedelta
, which is much more intuitive (to me).
Solution 2:
You can try using pd.DateOffset
which is mainly used for applying offset logic (month, year, hour) on dates format.
# get random dates
dates = pd.date_range(start = '1/1/2013', freq='H',periods=100,closed='left', normalize=True)
#take first date as example
date = dates[0]
# subtract a month
dates[0] - pd.DateOffset(months=1)
Timestamp('2012-12-01 00:00:00')
# to apply this on all dates
new_dates = list(map(lambda x: x - pd.DateOffset(months=1), dates))
Post a Comment for "Pandas Date_range - Subtracting Numpy Timedelta Gives Odd Result, Time Becomes Not 0:00:00"