Modify rule S6709: Add how to fix it for Scikit-learn (#3883)

2024-05-07 14:21:34 +02:00 · 2024-05-07 14:21:34 +02:00 · 86d6b7c75b
commit 86d6b7c75b
parent 621b7ce90e
4 changed files with 64 additions and 23 deletions
--- a/docs/header_names/allowed_framework_names.adoc
+++ b/docs/header_names/allowed_framework_names.adoc
@ -91,6 +91,7 @@
 * Jinja
 * lxml
 * MySQL Connector/Python
 * Numpy
 * Paramiko
 * pyca
 * PyCrypto
@ -105,6 +106,7 @@
 * PyYAML
 * Requests
 * Scrypt
 * Scikit-Learn
 * SignXML
 * SQLAlchemy
 * ssl
--- a/rules/S6709/python/how-to-fix-it/numpy.adoc
+++ b/rules/S6709/python/how-to-fix-it/numpy.adoc
@ -0,0 +1,27 @@
 == How to fix it in Numpy
 To fix this issue, provide a predictable seed to the random number generator.
 === Code examples
 ==== Noncompliant code example
 [source,python,diff-id=1,diff-type=noncompliant]
 ----
 import numpy as np
 def foo():
    generator = np.random.default_rng()  # Noncompliant: no seed parameter is provided
    x = generator.uniform()
 ----
 ==== Compliant solution
 [source,python,diff-id=1,diff-type=compliant]
 ----
 import numpy as np
 def foo():
    generator = np.random.default_rng(42)  # Compliant
    x = generator.uniform()
 ----
--- a/rules/S6709/python/how-to-fix-it/sklearn.adoc
+++ b/rules/S6709/python/how-to-fix-it/sklearn.adoc
@ -0,0 +1,29 @@
 == How to fix it in Scikit-Learn
 To fix this issue, provide a predictable seed to the estimator or the utility function.
 === Code examples
 ==== Noncompliant code example
 [source,python,diff-id=2,diff-type=noncompliant]
 ----
 from sklearn.model_selection import train_test_split
 from sklearn.datasets import load_iris 
 X, y = load_iris(return_X_y=True)
 X_train, _, y_train, _ = train_test_split(X, y) # Noncompliant: no seed parameter is provided
 ----
 ==== Compliant solution
 [source,python,diff-id=2,diff-type=compliant]
 ----
 from sklearn.model_selection import train_test_split
 from sklearn.datasets import load_iris 
 import numpy as np
 rng = np.random.default_rng(42)
 X, y = load_iris(return_X_y=True)
 X_train, _, y_train, _ = train_test_split(X, y, random_state=rng.integers(1)) # Compliant
 ----
--- a/rules/S6709/python/rule.adoc
+++ b/rules/S6709/python/rule.adoc
@ -26,38 +26,20 @@ Note that a global seed for `RandomState` can be set using `numpy.random.seed` o
 In contexts that are not related to data science and machine learning, having a predictable seed may not be the desired behavior. Therefore, this rule will only raise issues if machine learning and data science libraries are being used.
 == How to fix it
-To fix this issue, provide a predictable seed to the random number generator.
+// How to fix it section
-=== Code examples
+include::how-to-fix-it/numpy.adoc[]
-==== Noncompliant code example
+include::how-to-fix-it/sklearn.adoc[]
 [source,python,diff-id=1,diff-type=noncompliant]
 ----
 import numpy as np
 def foo():
    generator = np.random.default_rng()  # Noncompliant: no seed parameter is provided
    x = generator.uniform()
 ----
 ==== Compliant solution
 [source,python,diff-id=1,diff-type=compliant]
 ----
 import numpy as np
 def foo():
    generator = np.random.default_rng(42)  # Compliant
    x = generator.uniform()
 ----
 == Resources
 === Documentation
 * NumPy documentation - https://numpy.org/neps/nep-0019-rng-policy.html[NEP 19 RNG Policy]
 * Scikit-learn documentation - https://scikit-learn.org/stable/glossary.html#term-random_state[Glossary random_state]
 * Scikit-learn documentation - https://scikit-learn.org/stable/common_pitfalls.html#controlling-randomness[Controlling randomness]
 === Standards
@ -66,3 +48,4 @@ def foo():
 === Related rules
 * S6711 - `numpy.random.Generator` should be preferred to `numpy.random.RandomState`