Skip to content Skip to sidebar Skip to footer

Valueerror: Unknown Label Type: While Implementing Mlpclassifier

I have dataframe with columns Year, month, day,hour, minute, second, Daily_KWH. I need to predict Daily KWH using neural netowrk. Please let me know how to go about it Daily_

Solution 1:

First of all, this is a regression problem and not a classification problem, as the values in the Daily_KWH_System column do not form a set of labels. Instead, they seem to be (at least based on the provided example) real numbers.

If you want to approach it as a classification problem regardless, then according to sklearn documentation:

When doing classification in scikit-learn, y is a vector of integers or strings.

In your case, y is a vector of floats, and therefore you get the error. Thus, instead of the line

y = df['Daily_KWH_System']

write the line

y = np.asarray(df['Daily_KWH_System'], dtype="|S6")

and this will resolve the issue. (You can read more about this approach here: Python RandomForest - Unknown label Error)

Yet, as regression is more appropriate in this case, then instead of the above change, replace the lines

from sklearn.neural_network importMLPClassifiermlp= MLPClassifier(hidden_layer_sizes=(30,30,30))

with

from sklearn.neural_network importMLPRegressormlp= MLPRegressor(hidden_layer_sizes=(30,30,30))

The code will run without throwing an error (but there certainly isn't enough data to check whether the model that we get performs well).

With that being said, I don't think that this is the right approach for choosing features for this problem.

In this problem we deal with a sequence of real numbers that form a time series. One reasonable feature that we could choose is the number of seconds (or minutes\hours\days etc) that passed since the starting point. Since this particular data contains only days, months and years (other values are always 0), we could choose as a feature the number of days that passed since the beginning. Then your data frame will look like:

Daily_KWH_Systemdays_passed04136.900384    013061.657187    124099.614033    233922.490275    343957.128982    4

You could take the values in the column days_passed as features and the values in Daily_KWH_System as targets. You may also add some indicator features. For example, if you think that the end of the year may affect the target, you can add an indicator feature that indicates whether the month is December or not.

If the data is indeed daily (at least in this example you have one data point per day) and you want to tackle this problem with neural networks, then another reasonable approach would be to handle it as a time series and try to fit recurrent neural network. Here are couple of great blog posts that describe this approach:

http://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/

http://machinelearningmastery.com/time-series-forecasting-long-short-term-memory-network-python/

Solution 2:

The fit() function expects y to be 1D list. By slicing a Pandas dataframe you always get a 2D object. This means that for your case, you need to convert the 2D object you got from slicing the DataFrame into an actual 1D list, as expected by fit function:

y = list(df['Daily_KWH_System'])

Solution 3:

Use a regressor instead. This will solve float 2D data issue.

from sklearn.neural_network importMLPRegressormodel= MLPRegressor(solver='lbfgs',alpha=0.001,hidden_layer_sizes=(10,10))

model.fit(x_train,y_train)

y_pred = model.predict(x_test)

Solution 4:

Instead of mlp.fit(X_train,y_train) use this mlp.fit(X_train,y_train.values)

Post a Comment for "Valueerror: Unknown Label Type: While Implementing Mlpclassifier"