Weekday As Dummy / Factor Variable In A Linear Regression Model Using Statsmodels
The question: How can I add a dummy / factor variable to a model using sm.OLS()? The details: Data sample structure: Date A B weekday 2013-05-04 25.03 88.51 Saturday 20
Solution 1:
You can use pandas categorical to create the dummy variables, or, simpler, use the formula interface where patsy transforms all non-numeric columns to the dummy variables, or other factor encoding.
Using the formula interface in this case (same as lower case ols
in statsmodels.formula.api) shows the result below.
Patsy sorts levels of the categorical variable alphabetically. 'Friday' is missing in the list of variables and has been selected as reference category.
>>> res = sm.OLS.from_formula('A ~ B + weekday', df).fit()
>>> print(res.summary())
OLS Regression Results
==============================================================================
Dep. Variable: A R-squared: 0.301
Model: OLS Adj. R-squared: 0.029
Method: Least Squares F-statistic: 1.105
Date: Thu, 03 May 2018 Prob (F-statistic): 0.401
Time: 15:26:02 Log-Likelihood: -97.898
No. Observations: 26 AIC: 211.8
Df Residuals: 18 BIC: 221.9
Df Model: 7
Covariance Type: nonrobust
========================================================================================
coef std err t P>|t| [0.025 0.975]
----------------------------------------------------------------------------------------
Intercept -1.4717 19.343 -0.076 0.940 -42.110 39.167
weekday[T.Monday] 2.5837 9.857 0.262 0.796 -18.124 23.291
weekday[T.Saturday] -6.5889 9.599 -0.686 0.501 -26.755 13.577
weekday[T.Sunday] 9.2287 9.616 0.960 0.350 -10.975 29.432
weekday[T.Thursday] -1.7610 10.321 -0.171 0.866 -23.445 19.923
weekday[T.Tuesday] 2.6507 9.664 0.274 0.787 -17.652 22.953
weekday[T.Wendesday] -6.9320 9.911 -0.699 0.493 -27.754 13.890
B 0.4047 0.258 1.566 0.135 -0.138 0.948
==============================================================================
Omnibus: 1.039 Durbin-Watson: 2.313
Prob(Omnibus): 0.595 Jarque-Bera (JB): 0.532
Skew: -0.350 Prob(JB): 0.766
Kurtosis: 3.007 Cond. No. 638.
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
See patsy documentation for options for categorical encodings http://patsy.readthedocs.io/en/latest/categorical-coding.html
For example, the reference coding can be specified explicitly as in this formula
"A ~ B + C(weekday, Treatment('Sunday'))"
http://patsy.readthedocs.io/en/latest/API-reference.html#patsy.Treatment
Post a Comment for "Weekday As Dummy / Factor Variable In A Linear Regression Model Using Statsmodels"