Improved implementation details

Fix quickfix metadata
Fix after review
2024-04-16 09:52:40 +02:00 · 2024-04-16 09:41:34 +02:00 · 2024-04-15 15:46:20 +02:00 · 2024-04-15 15:18:53 +02:00 · 2024-04-15 13:58:36 +02:00 · 2024-04-15 13:17:05 +02:00
3 changed files with 137 additions and 0 deletions
--- a/rules/S6970/metadata.json
+++ b/rules/S6970/metadata.json
@ -0,0 +1,2 @@
 {
 }
--- a/rules/S6970/python/metadata.json
+++ b/rules/S6970/python/metadata.json
@ -0,0 +1,23 @@
 {
  "title": "The Scikit-learn \"fit\" method should be called before methods yielding results",
  "type": "CODE_SMELL",
  "status": "ready",
  "remediation": {
    "func": "Constant\/Issue",
    "constantCost": "5min"
  },
  "tags": [
  ],
  "defaultSeverity": "Major",
  "ruleSpecification": "RSPEC-6970",
  "sqKey": "S6970",
  "scope": "All",
  "defaultQualityProfiles": ["Sonar way"],
  "quickfix": "infeasible",
  "code": {
    "impacts": {
      "RELIABILITY": "HIGH"
    },
    "attribute": "CONVENTIONAL"
  }
 }
--- a/rules/S6970/python/rule.adoc
+++ b/rules/S6970/python/rule.adoc
@ -0,0 +1,112 @@
 This rule raises an issue if the Scikit-learn `fit` or `partial_fit` methods are not called prior to a method yielding results.
 == Why is this an issue?
 When using the Scikit-learn library it is crucial to train the estimator or transformer before
 attempting to get results. Failing to do so can lead to incorrect results or runtime errors. 
 The training is done with the help of the `fit` or `partial_fit` methods and retrieving results can be done for example with the `predict` method.
 If the `predict` method is called without a prior call to the `fit` method, a `NotFittedError` exception will be raised.
 In this case the error is unambiguous but in some other cases the error raised could be less explicit.
 [source,python]
 ----
 from sklearn.datasets import load_iris 
 from sklearn.neighbors import KNeighborsClassifier 
 iris = load_iris() 
 knn = KNeighborsClassifier(1) 
 knn.n_samples_fit_ # raises an AttributeError
 ----
 In the example above, failing to train the model on the iris dataset with the
 `fit` method results in a more cryptic error where ``++n_samples_fit_++`` is not an
 attribute of `KNeighborsClassifier`, this is because this attribute is only set after the method `fit`
 is called. 
 This rule will raise an issue when the following methods are called without a prior call to `fit` or `partial_fit`:
 * `predict`
 * `predict_proba`
 * `predict_log_proba`
 * `score`
 * `score_samples`
 * `decision_function`
 * `transform`
 * `inverse_transform`
 as well as any attributes of an estimator ending with a single underscore (denoting that this attribute is set during `fit` or `partial_fit`).
 == How to fix it
 To fix the issue, call the `fit` or `partial_fit` method before retrieving results.
 === Code examples
 ==== Noncompliant code example
 [source,python,diff-id=1,diff-type=noncompliant]
 ----
 from sklearn import datasets
 from sklearn.cluster import KMeans
 iris = datasets.load_iris()
 X = iris.data
 kmeans = KMeans(n_clusters=3, random_state=42)
 kmeans.predict(X) # Noncompliant: raises a NotFittedError
 ----
 ==== Compliant solution
 [source,python,diff-id=1,diff-type=compliant]
 ----
 from sklearn import datasets
 from sklearn.cluster import KMeans
 iris = datasets.load_iris()
 X = iris.data
 kmeans = KMeans(n_clusters=3, random_state=42)
 kmeans.fit(X)
 kmeans.predict(X) # Compliant
 ----
 == Resources
 === Documentation
 * Scikit-learn Documentation - https://scikit-learn.org/stable/glossary.html#term-fit[Glossary fit reference]
 * Scikit-learn Documentation - https://scikit-learn.org/stable/modules/generated/sklearn.exceptions.NotFittedError.html#sklearn.exceptions.NotFittedError[NotFittedError reference]
 ifdef::env-github,rspecator-view[]
 Implementation details: 
 * predict
 * predict_proba
 * predict_log_proba
 * score
 * score_samples
 * decision_function
 * transform
 * inverse_transform
 If the list of methods above are called, we should check for the `fit` or `partial_fit` methods called on the same object. 
 Or if an argument of an estimator is called and the name of the argument ends with an underscore we should check for the fit or partial_fit methods call.
 An estimator can be detected if the object inherits from `BaseEstimator`.
 Issue location: the name of the method or attribute (from the list above) 
 Message: Call the fit method on this estimator before retrieving the results. 
 Quickfix: Not applicable (could be too tricky as the parameters of fit and predict could be different)
 endif::env-github,rspecator-view[]
Author	SHA1	Message	Date
David Kunzmann	5df20e6b86	Improved implementation details	2024-04-16 09:52:40 +02:00
David Kunzmann	b60539a611	Fix quickfix metadata	2024-04-16 09:41:34 +02:00
David Kunzmann	138fac482c	Fix after review	2024-04-15 15:46:20 +02:00
David Kunzmann	fb901b8bbd	Added implementation details in the comment section	2024-04-15 15:18:53 +02:00
David Kunzmann	1a774b466f	Fix description and list indent	2024-04-15 13:58:36 +02:00
David Kunzmann	1da08c1787	Fix documentation formatting	2024-04-15 13:17:05 +02:00
David Kunzmann	a55d815cbf	Fixed asciidoc error	2024-04-15 11:40:49 +02:00
David Kunzmann	249eea5fd5	Create rule S6970: The Scikit-learn \"fit\" method should be called before methods yielding results	2024-04-15 11:12:20 +02:00
joke1196	ce572252ce	Create rule S6970	2024-04-15 11:12:20 +02:00