88 lines
3.0 KiB
Plaintext
88 lines
3.0 KiB
Plaintext
This rule raises an issue when an invalid nested estimator parameter is set on a Pipeline.
|
|
|
|
== Why is this an issue?
|
|
|
|
In the sklearn library, when using the `Pipeline` class, it is possible to modify the parameters of the nested estimators.
|
|
This modification can be done by using the `Pipeline` method `set_params` and specifying the name of the estimator
|
|
and the parameter to update separated by a double underscore ``++__++``.
|
|
|
|
[source,python]
|
|
----
|
|
from sklearn.pipeline import Pipeline
|
|
from sklearn.svm import SVC
|
|
|
|
pipe = Pipeline(steps=[("clf", SVC())])
|
|
pipe.set_params(clf__C=10)
|
|
----
|
|
|
|
In the example above, the regularization parameter `C` is set to the value `10`
|
|
for the classifier called `clf`. Setting such parameters can be done as well
|
|
with the help of the `param_grid` parameter for example when using `GridSearchCV`.
|
|
|
|
Providing invalid parameters that do not exist on the estimator can lead to unexpected behavior or runtime errors.
|
|
|
|
This rule checks that the parameters provided to the `set_params` method of a
|
|
Pipeline instance or through the `param_grid` parameters of a `GridSearchCV`
|
|
are valid for the nested estimators.
|
|
|
|
|
|
== How to fix it
|
|
|
|
To fix this issue provide valid parameters to the nested estimators.
|
|
|
|
=== Code examples
|
|
|
|
==== Noncompliant code example
|
|
|
|
[source,python,diff-id=1,diff-type=noncompliant]
|
|
----
|
|
from sklearn.pipeline import Pipeline
|
|
from sklearn.decomposition import PCA
|
|
|
|
pipe = Pipeline(steps=[('reduce_dim', PCA())])
|
|
pipe.set_params(reduce_dim__C=2) # Noncompliant: the parameter C does not exists for the PCA estimator
|
|
----
|
|
|
|
==== Compliant solution
|
|
|
|
[source,python,diff-id=1,diff-type=compliant]
|
|
----
|
|
from sklearn.pipeline import Pipeline
|
|
from sklearn.decomposition import PCA
|
|
|
|
pipe = Pipeline(steps=[('reduce_dim', PCA())])
|
|
pipe.set_params(reduce_dim__n_components=2) # Compliant
|
|
----
|
|
|
|
== Resources
|
|
=== Documentation
|
|
|
|
* Scikit-Learn documentation - https://scikit-learn.org/stable/modules/compose.html#access-to-nested-parameters[Access to nested parameters]
|
|
* Scikit-Learn documentation - https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn-model-selection-gridsearchcv[GridSearchCV reference]
|
|
|
|
ifdef::env-github,rspecator-view[]
|
|
|
|
(visible only on this page)
|
|
|
|
== Implementation specification
|
|
|
|
Verify that the keys of the dict passed to the set_params method of the Pipeline or the keys of
|
|
the dict passed to the params_grid of a GridSearchCV, contain the name of the
|
|
estimator and a parameter that exists on the estimator. We should refer to the stubs, to verify if a parameter is valid.
|
|
The name of the estimator is set in the Pipeline constructor in both cases. The key should have
|
|
the following shape <name_of_the_estimator>__<name_of_the_parameter>.
|
|
|
|
=== Message
|
|
|
|
Provide valid parameters to the nested estimators.
|
|
|
|
Secondary location message: The invalid parameter is set here.
|
|
|
|
=== Issue location
|
|
|
|
The parameter passed as param_grid or the parameter passed to set_params
|
|
|
|
Secondary location: The invalid parameter name (the key of the dictionary)
|
|
|
|
endif::env-github,rspecator-view[]
|