Skip to content Skip to sidebar Skip to footer

Pandas Date_range - Subtracting Numpy Timedelta Gives Odd Result, Time Becomes Not 0:00:00

I am trying to generate a set of dates with pandas date_range functionality. Then I want to iterate over this range and subtract several months from each of the dates (exact number

Solution 1:

I've had similar issues with timedelta, and the solution I've ended up using was using relativedelta from dateutil, which is specifically built for this kind of application (taking into account all the calendar weirdness like leap years, weekdays, etc...). For example given:

from dateutil.relativedelta import relativedelta

date = dates[0]

>>> date
Timestamp('2013-01-01 00:00:00', freq='10MS')

deltaGap = relativedelta(months=1)

>>> date-deltaGap
Timestamp('2012-12-01 00:00:00', freq='10MS')

deltaGap = relativedelta(years=2, months=1)

>>> date-deltaGap
Timestamp('2010-12-01 00:00:00', freq='10MS')

Check out the documentation for more info on relativedelta

The issues with numpy.timedelta64

I think that the problem with np.timedelta is revealed in these 2 parts of the docs:

There are two Timedelta units (‘Y’, years and ‘M’, months) which are treated specially, because how much time they represent changes depending on when they are used. While a timedelta day unit is equivalent to 24 hours, there is no way to convert a month unit into days, because different months have different numbers of days.

and

The length of the span is the range of a 64-bit integer times the length of the date or unit. For example, the time span for ‘W’ (week) is exactly 7 times longer than the time span for ‘D’ (day), and the time span for ‘D’ (day) is exactly 24 times longer than the time span for ‘h’ (hour).

So the timedeltas are fine for hours, weeks, months, days, because these are non-variable timespans. However, months and years are variable in length (think leap years), and so to take this into account, numpy takes some sort of "average" (I guess). One numpy "year" seems to be one year, 5 hours, 49 minutes and 12 seconds, while one numpy "month" seems to be 30 days, 10 hours, 29 minutes and 6 seconds.

# Adding one numpy month adds 30 days + 10:29:06:
deltaGap = np.timedelta64(1,'M')
date+deltaGap
# Timestamp('2013-01-31 10:29:06', freq='10MS')

# Adding one numpy year adds 1 year + 05:49:12:
deltaGap = np.timedelta64(1,'Y')
date+deltaGap
# Timestamp('2014-01-01 05:49:12', freq='10MS')

This is not so easy to work with, which is why I would just go to relativedelta, which is much more intuitive (to me).


Solution 2:

You can try using pd.DateOffset which is mainly used for applying offset logic (month, year, hour) on dates format.

# get random dates
dates = pd.date_range(start = '1/1/2013', freq='H',periods=100,closed='left', normalize=True)

#take first date as example
date = dates[0]

# subtract a month
dates[0] - pd.DateOffset(months=1)
Timestamp('2012-12-01 00:00:00')

# to apply this on all dates
new_dates = list(map(lambda x: x - pd.DateOffset(months=1), dates))

Post a Comment for "Pandas Date_range - Subtracting Numpy Timedelta Gives Odd Result, Time Becomes Not 0:00:00"