Selecting Kernel And Hyperparameters For Kernel Pca Reduction
Solution 1:
GridSearchCV
is capable of doing cross-validation of unsupervised learning (without a y
) as can be seen here in documentation:
fit(X, y=None, groups=None, **fit_params)
... y : array-like, shape = [n_samples] or [n_samples, n_output], optional Target relative to X for classification or regression; None for unsupervised learning ...
So the only thing that needs to be handled is how the scoring
will be done.
The following will happen in GridSearchCV:
The data
X
will be be divided into train-test splits based on folds defined incv
paramFor each combination of parameters that you specified in
param_grid
, the model will be trained on thetrain
part from the step above and thenscoring
will be used ontest
part.The
scores
for each parameter combination will be combined for all the folds and averaged. Highest performing parameter combination will be selected.
Now the tricky part is 2. By default, if you provide a 'string'
in that, it will be converted to a make_scorer
object internally. For 'mean_squared_error'
the relevant code is here:
....
neg_mean_squared_error_scorer = make_scorer(mean_squared_error,
greater_is_better=False)
....
which is what you dont want, because that requires y_true
and y_pred
.
The other option is to make your own custom scorer as discussed here with signature (estimator, X, y)
. Something like below for your case:
from sklearn.metrics import mean_squared_error
defmy_scorer(estimator, X, y=None):
X_reduced = estimator.transform(X)
X_preimage = estimator.inverse_transform(X_reduced)
return -1 * mean_squared_error(X, X_preimage)
Then use it in GridSearchCV like this:
param_grid = [{
"gamma": np.linspace(0.03, 0.05, 10),
"kernel": ["rbf", "sigmoid", "linear", "poly"]
}]
kpca=KernelPCA(fit_inverse_transform=True, n_jobs=-1)
grid_search = GridSearchCV(kpca, param_grid, cv=3, scoring=my_scorer)
grid_search.fit(X)
Post a Comment for "Selecting Kernel And Hyperparameters For Kernel Pca Reduction"