rspec/rules/S6972/python/rule.adoc

88 lines
3.0 KiB
Plaintext

This rule raises an issue when an invalid nested estimator parameter is set on a Pipeline.
== Why is this an issue?
In the sklearn library, when using the `Pipeline` class, it is possible to modify the parameters of the nested estimators.
This modification can be done by using the `Pipeline` method `set_params` and specifying the name of the estimator
and the parameter to update separated by a double underscore ``++__++``.
[source,python]
----
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
pipe = Pipeline(steps=[("clf", SVC())])
pipe.set_params(clf__C=10)
----
In the example above, the regularization parameter `C` is set to the value `10`
for the classifier called `clf`. Setting such parameters can be done as well
with the help of the `param_grid` parameter for example when using `GridSearchCV`.
Providing invalid parameters that do not exist on the estimator can lead to unexpected behavior or runtime errors.
This rule checks that the parameters provided to the `set_params` method of a
Pipeline instance or through the `param_grid` parameters of a `GridSearchCV`
are valid for the nested estimators.
== How to fix it
To fix this issue provide valid parameters to the nested estimators.
=== Code examples
==== Noncompliant code example
[source,python,diff-id=1,diff-type=noncompliant]
----
from sklearn.pipeline import Pipeline
from sklearn.decomposition import PCA
pipe = Pipeline(steps=[('reduce_dim', PCA())])
pipe.set_params(reduce_dim__C=2) # Noncompliant: the parameter C does not exists for the PCA estimator
----
==== Compliant solution
[source,python,diff-id=1,diff-type=compliant]
----
from sklearn.pipeline import Pipeline
from sklearn.decomposition import PCA
pipe = Pipeline(steps=[('reduce_dim', PCA())])
pipe.set_params(reduce_dim__n_components=2) # Compliant
----
== Resources
=== Documentation
* Scikit-Learn documentation - https://scikit-learn.org/stable/modules/compose.html#access-to-nested-parameters[Access to nested parameters]
* Scikit-Learn documentation - https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn-model-selection-gridsearchcv[GridSearchCV reference]
ifdef::env-github,rspecator-view[]
(visible only on this page)
== Implementation specification
Verify that the keys of the dict passed to the set_params method of the Pipeline or the keys of
the dict passed to the params_grid of a GridSearchCV, contain the name of the
estimator and a parameter that exists on the estimator. We should refer to the stubs, to verify if a parameter is valid.
The name of the estimator is set in the Pipeline constructor in both cases. The key should have
the following shape <name_of_the_estimator>__<name_of_the_parameter>.
=== Message
Provide valid parameters to the nested estimators.
Secondary location message: The invalid parameter is set here.
=== Issue location
The parameter passed as param_grid or the parameter passed to set_params
Secondary location: The invalid parameter name (the key of the dictionary)
endif::env-github,rspecator-view[]