Compare commits
9 Commits
master
...
rule/add-R
Author | SHA1 | Date | |
---|---|---|---|
![]() |
5df20e6b86 | ||
![]() |
b60539a611 | ||
![]() |
138fac482c | ||
![]() |
fb901b8bbd | ||
![]() |
1a774b466f | ||
![]() |
1da08c1787 | ||
![]() |
a55d815cbf | ||
![]() |
249eea5fd5 | ||
![]() |
ce572252ce |
2
rules/S6970/metadata.json
Normal file
2
rules/S6970/metadata.json
Normal file
@ -0,0 +1,2 @@
|
||||
{
|
||||
}
|
23
rules/S6970/python/metadata.json
Normal file
23
rules/S6970/python/metadata.json
Normal file
@ -0,0 +1,23 @@
|
||||
{
|
||||
"title": "The Scikit-learn \"fit\" method should be called before methods yielding results",
|
||||
"type": "CODE_SMELL",
|
||||
"status": "ready",
|
||||
"remediation": {
|
||||
"func": "Constant\/Issue",
|
||||
"constantCost": "5min"
|
||||
},
|
||||
"tags": [
|
||||
],
|
||||
"defaultSeverity": "Major",
|
||||
"ruleSpecification": "RSPEC-6970",
|
||||
"sqKey": "S6970",
|
||||
"scope": "All",
|
||||
"defaultQualityProfiles": ["Sonar way"],
|
||||
"quickfix": "infeasible",
|
||||
"code": {
|
||||
"impacts": {
|
||||
"RELIABILITY": "HIGH"
|
||||
},
|
||||
"attribute": "CONVENTIONAL"
|
||||
}
|
||||
}
|
112
rules/S6970/python/rule.adoc
Normal file
112
rules/S6970/python/rule.adoc
Normal file
@ -0,0 +1,112 @@
|
||||
This rule raises an issue if the Scikit-learn `fit` or `partial_fit` methods are not called prior to a method yielding results.
|
||||
|
||||
== Why is this an issue?
|
||||
|
||||
When using the Scikit-learn library it is crucial to train the estimator or transformer before
|
||||
attempting to get results. Failing to do so can lead to incorrect results or runtime errors.
|
||||
The training is done with the help of the `fit` or `partial_fit` methods and retrieving results can be done for example with the `predict` method.
|
||||
|
||||
If the `predict` method is called without a prior call to the `fit` method, a `NotFittedError` exception will be raised.
|
||||
In this case the error is unambiguous but in some other cases the error raised could be less explicit.
|
||||
|
||||
[source,python]
|
||||
----
|
||||
from sklearn.datasets import load_iris
|
||||
from sklearn.neighbors import KNeighborsClassifier
|
||||
|
||||
iris = load_iris()
|
||||
knn = KNeighborsClassifier(1)
|
||||
knn.n_samples_fit_ # raises an AttributeError
|
||||
----
|
||||
|
||||
In the example above, failing to train the model on the iris dataset with the
|
||||
`fit` method results in a more cryptic error where ``++n_samples_fit_++`` is not an
|
||||
attribute of `KNeighborsClassifier`, this is because this attribute is only set after the method `fit`
|
||||
is called.
|
||||
|
||||
This rule will raise an issue when the following methods are called without a prior call to `fit` or `partial_fit`:
|
||||
|
||||
* `predict`
|
||||
* `predict_proba`
|
||||
* `predict_log_proba`
|
||||
* `score`
|
||||
* `score_samples`
|
||||
* `decision_function`
|
||||
* `transform`
|
||||
* `inverse_transform`
|
||||
|
||||
as well as any attributes of an estimator ending with a single underscore (denoting that this attribute is set during `fit` or `partial_fit`).
|
||||
|
||||
== How to fix it
|
||||
|
||||
To fix the issue, call the `fit` or `partial_fit` method before retrieving results.
|
||||
|
||||
=== Code examples
|
||||
|
||||
==== Noncompliant code example
|
||||
|
||||
[source,python,diff-id=1,diff-type=noncompliant]
|
||||
----
|
||||
from sklearn import datasets
|
||||
from sklearn.cluster import KMeans
|
||||
|
||||
iris = datasets.load_iris()
|
||||
X = iris.data
|
||||
|
||||
kmeans = KMeans(n_clusters=3, random_state=42)
|
||||
kmeans.predict(X) # Noncompliant: raises a NotFittedError
|
||||
----
|
||||
|
||||
==== Compliant solution
|
||||
|
||||
[source,python,diff-id=1,diff-type=compliant]
|
||||
----
|
||||
from sklearn import datasets
|
||||
from sklearn.cluster import KMeans
|
||||
|
||||
iris = datasets.load_iris()
|
||||
X = iris.data
|
||||
|
||||
kmeans = KMeans(n_clusters=3, random_state=42)
|
||||
kmeans.fit(X)
|
||||
kmeans.predict(X) # Compliant
|
||||
----
|
||||
|
||||
== Resources
|
||||
=== Documentation
|
||||
|
||||
* Scikit-learn Documentation - https://scikit-learn.org/stable/glossary.html#term-fit[Glossary fit reference]
|
||||
* Scikit-learn Documentation - https://scikit-learn.org/stable/modules/generated/sklearn.exceptions.NotFittedError.html#sklearn.exceptions.NotFittedError[NotFittedError reference]
|
||||
|
||||
ifdef::env-github,rspecator-view[]
|
||||
|
||||
Implementation details:
|
||||
|
||||
* predict
|
||||
|
||||
* predict_proba
|
||||
|
||||
* predict_log_proba
|
||||
|
||||
* score
|
||||
|
||||
* score_samples
|
||||
|
||||
* decision_function
|
||||
|
||||
* transform
|
||||
|
||||
* inverse_transform
|
||||
|
||||
If the list of methods above are called, we should check for the `fit` or `partial_fit` methods called on the same object.
|
||||
Or if an argument of an estimator is called and the name of the argument ends with an underscore we should check for the fit or partial_fit methods call.
|
||||
|
||||
An estimator can be detected if the object inherits from `BaseEstimator`.
|
||||
|
||||
Issue location: the name of the method or attribute (from the list above)
|
||||
|
||||
Message: Call the fit method on this estimator before retrieving the results.
|
||||
|
||||
Quickfix: Not applicable (could be too tricky as the parameters of fit and predict could be different)
|
||||
|
||||
endif::env-github,rspecator-view[]
|
Loading…
x
Reference in New Issue
Block a user