Improved implementation details

Fix quickfix metadata
Fix after review
2024-04-16 09:52:40 +02:00 · 2024-04-16 09:41:34 +02:00 · 2024-04-15 15:46:20 +02:00 · 2024-04-15 15:18:53 +02:00 · 2024-04-15 13:58:36 +02:00 · 2024-04-15 13:17:05 +02:00
3 changed files with 137 additions and 0 deletions
--- a/rules/S6970/metadata.json
+++ b/rules/S6970/metadata.json
@ -0,0 +1,2 @@
+{
+}
--- a/rules/S6970/python/metadata.json
+++ b/rules/S6970/python/metadata.json
@ -0,0 +1,23 @@
+{
+  "title": "The Scikit-learn \"fit\" method should be called before methods yielding results",
+  "type": "CODE_SMELL",
+  "status": "ready",
+  "remediation": {
+    "func": "Constant\/Issue",
+    "constantCost": "5min"
+  },
+  "tags": [
+  ],
+  "defaultSeverity": "Major",
+  "ruleSpecification": "RSPEC-6970",
+  "sqKey": "S6970",
+  "scope": "All",
+  "defaultQualityProfiles": ["Sonar way"],
+  "quickfix": "infeasible",
+  "code": {
+    "impacts": {
+      "RELIABILITY": "HIGH"
+    },
+    "attribute": "CONVENTIONAL"
+  }
+}
--- a/rules/S6970/python/rule.adoc
+++ b/rules/S6970/python/rule.adoc
@ -0,0 +1,112 @@
+This rule raises an issue if the Scikit-learn `fit` or `partial_fit` methods are not called prior to a method yielding results.
+
+== Why is this an issue?
+
+When using the Scikit-learn library it is crucial to train the estimator or transformer before
+attempting to get results. Failing to do so can lead to incorrect results or runtime errors. 
+The training is done with the help of the `fit` or `partial_fit` methods and retrieving results can be done for example with the `predict` method.
+
+If the `predict` method is called without a prior call to the `fit` method, a `NotFittedError` exception will be raised.
+In this case the error is unambiguous but in some other cases the error raised could be less explicit.
+
+[source,python]
+----
+from sklearn.datasets import load_iris 
+from sklearn.neighbors import KNeighborsClassifier 
+
+iris = load_iris() 
+knn = KNeighborsClassifier(1) 
+knn.n_samples_fit_ # raises an AttributeError
+----
+
+In the example above, failing to train the model on the iris dataset with the
+`fit` method results in a more cryptic error where ``++n_samples_fit_++`` is not an
+attribute of `KNeighborsClassifier`, this is because this attribute is only set after the method `fit`
+is called. 
+
+This rule will raise an issue when the following methods are called without a prior call to `fit` or `partial_fit`:
+
+* `predict`
+* `predict_proba`
+* `predict_log_proba`
+* `score`
+* `score_samples`
+* `decision_function`
+* `transform`
+* `inverse_transform`
+
+as well as any attributes of an estimator ending with a single underscore (denoting that this attribute is set during `fit` or `partial_fit`).
+
+== How to fix it
+
+To fix the issue, call the `fit` or `partial_fit` method before retrieving results.
+
+=== Code examples
+
+==== Noncompliant code example
+
+[source,python,diff-id=1,diff-type=noncompliant]
+----
+from sklearn import datasets
+from sklearn.cluster import KMeans
+
+iris = datasets.load_iris()
+X = iris.data
+
+kmeans = KMeans(n_clusters=3, random_state=42)
+kmeans.predict(X) # Noncompliant: raises a NotFittedError
+----
+
+==== Compliant solution
+
+[source,python,diff-id=1,diff-type=compliant]
+----
+from sklearn import datasets
+from sklearn.cluster import KMeans
+
+iris = datasets.load_iris()
+X = iris.data
+
+kmeans = KMeans(n_clusters=3, random_state=42)
+kmeans.fit(X)
+kmeans.predict(X) # Compliant
+----
+
+== Resources
+=== Documentation
+
+* Scikit-learn Documentation - https://scikit-learn.org/stable/glossary.html#term-fit[Glossary fit reference]
+* Scikit-learn Documentation - https://scikit-learn.org/stable/modules/generated/sklearn.exceptions.NotFittedError.html#sklearn.exceptions.NotFittedError[NotFittedError reference]
+
+ifdef::env-github,rspecator-view[]
+
+Implementation details: 
+
+* predict
+
+* predict_proba
+
+* predict_log_proba
+
+* score
+
+* score_samples
+
+* decision_function
+
+* transform
+
+* inverse_transform
+
+If the list of methods above are called, we should check for the `fit` or `partial_fit` methods called on the same object. 
+Or if an argument of an estimator is called and the name of the argument ends with an underscore we should check for the fit or partial_fit methods call.
+
+An estimator can be detected if the object inherits from `BaseEstimator`.
+
+Issue location: the name of the method or attribute (from the list above) 
+
+Message: Call the fit method on this estimator before retrieving the results. 
+
+Quickfix: Not applicable (could be too tricky as the parameters of fit and predict could be different)
+
+endif::env-github,rspecator-view[]
Author	SHA1	Message	Date
David Kunzmann	5df20e6b86	Improved implementation details	2024-04-16 09:52:40 +02:00
David Kunzmann	b60539a611	Fix quickfix metadata	2024-04-16 09:41:34 +02:00
David Kunzmann	138fac482c	Fix after review	2024-04-15 15:46:20 +02:00
David Kunzmann	fb901b8bbd	Added implementation details in the comment section	2024-04-15 15:18:53 +02:00
David Kunzmann	1a774b466f	Fix description and list indent	2024-04-15 13:58:36 +02:00
David Kunzmann	1da08c1787	Fix documentation formatting	2024-04-15 13:17:05 +02:00
David Kunzmann	a55d815cbf	Fixed asciidoc error	2024-04-15 11:40:49 +02:00
David Kunzmann	249eea5fd5	Create rule S6970: The Scikit-learn \"fit\" method should be called before methods yielding results	2024-04-15 11:12:20 +02:00
joke1196	ce572252ce	Create rule S6970	2024-04-15 11:12:20 +02:00