Compare commits

...

9 Commits

Author SHA1 Message Date
David Kunzmann
5df20e6b86 Improved implementation details 2024-04-16 09:52:40 +02:00
David Kunzmann
b60539a611 Fix quickfix metadata 2024-04-16 09:41:34 +02:00
David Kunzmann
138fac482c Fix after review 2024-04-15 15:46:20 +02:00
David Kunzmann
fb901b8bbd Added implementation details in the comment section 2024-04-15 15:18:53 +02:00
David Kunzmann
1a774b466f Fix description and list indent 2024-04-15 13:58:36 +02:00
David Kunzmann
1da08c1787 Fix documentation formatting 2024-04-15 13:17:05 +02:00
David Kunzmann
a55d815cbf Fixed asciidoc error 2024-04-15 11:40:49 +02:00
David Kunzmann
249eea5fd5 Create rule S6970: The Scikit-learn \"fit\" method should be called before methods yielding results 2024-04-15 11:12:20 +02:00
joke1196
ce572252ce Create rule S6970 2024-04-15 11:12:20 +02:00
3 changed files with 137 additions and 0 deletions

View File

@ -0,0 +1,2 @@
{
}

View File

@ -0,0 +1,23 @@
{
"title": "The Scikit-learn \"fit\" method should be called before methods yielding results",
"type": "CODE_SMELL",
"status": "ready",
"remediation": {
"func": "Constant\/Issue",
"constantCost": "5min"
},
"tags": [
],
"defaultSeverity": "Major",
"ruleSpecification": "RSPEC-6970",
"sqKey": "S6970",
"scope": "All",
"defaultQualityProfiles": ["Sonar way"],
"quickfix": "infeasible",
"code": {
"impacts": {
"RELIABILITY": "HIGH"
},
"attribute": "CONVENTIONAL"
}
}

View File

@ -0,0 +1,112 @@
This rule raises an issue if the Scikit-learn `fit` or `partial_fit` methods are not called prior to a method yielding results.
== Why is this an issue?
When using the Scikit-learn library it is crucial to train the estimator or transformer before
attempting to get results. Failing to do so can lead to incorrect results or runtime errors.
The training is done with the help of the `fit` or `partial_fit` methods and retrieving results can be done for example with the `predict` method.
If the `predict` method is called without a prior call to the `fit` method, a `NotFittedError` exception will be raised.
In this case the error is unambiguous but in some other cases the error raised could be less explicit.
[source,python]
----
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
iris = load_iris()
knn = KNeighborsClassifier(1)
knn.n_samples_fit_ # raises an AttributeError
----
In the example above, failing to train the model on the iris dataset with the
`fit` method results in a more cryptic error where ``++n_samples_fit_++`` is not an
attribute of `KNeighborsClassifier`, this is because this attribute is only set after the method `fit`
is called.
This rule will raise an issue when the following methods are called without a prior call to `fit` or `partial_fit`:
* `predict`
* `predict_proba`
* `predict_log_proba`
* `score`
* `score_samples`
* `decision_function`
* `transform`
* `inverse_transform`
as well as any attributes of an estimator ending with a single underscore (denoting that this attribute is set during `fit` or `partial_fit`).
== How to fix it
To fix the issue, call the `fit` or `partial_fit` method before retrieving results.
=== Code examples
==== Noncompliant code example
[source,python,diff-id=1,diff-type=noncompliant]
----
from sklearn import datasets
from sklearn.cluster import KMeans
iris = datasets.load_iris()
X = iris.data
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.predict(X) # Noncompliant: raises a NotFittedError
----
==== Compliant solution
[source,python,diff-id=1,diff-type=compliant]
----
from sklearn import datasets
from sklearn.cluster import KMeans
iris = datasets.load_iris()
X = iris.data
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)
kmeans.predict(X) # Compliant
----
== Resources
=== Documentation
* Scikit-learn Documentation - https://scikit-learn.org/stable/glossary.html#term-fit[Glossary fit reference]
* Scikit-learn Documentation - https://scikit-learn.org/stable/modules/generated/sklearn.exceptions.NotFittedError.html#sklearn.exceptions.NotFittedError[NotFittedError reference]
ifdef::env-github,rspecator-view[]
Implementation details:
* predict
* predict_proba
* predict_log_proba
* score
* score_samples
* decision_function
* transform
* inverse_transform
If the list of methods above are called, we should check for the `fit` or `partial_fit` methods called on the same object.
Or if an argument of an estimator is called and the name of the argument ends with an underscore we should check for the fit or partial_fit methods call.
An estimator can be detected if the object inherits from `BaseEstimator`.
Issue location: the name of the method or attribute (from the list above)
Message: Call the fit method on this estimator before retrieving the results.
Quickfix: Not applicable (could be too tricky as the parameters of fit and predict could be different)
endif::env-github,rspecator-view[]