Modify rule S6709: Add how to fix it for Scikit-learn (#3883)

2024-05-07 14:21:34 +02:00 · 2024-05-07 14:21:34 +02:00 · 86d6b7c75b
commit 86d6b7c75b
parent 621b7ce90e
4 changed files with 64 additions and 23 deletions
--- a/docs/header_names/allowed_framework_names.adoc
+++ b/docs/header_names/allowed_framework_names.adoc
@ -91,6 +91,7 @@
 * Jinja
 * lxml
 * MySQL Connector/Python
+* Numpy
 * Paramiko
 * pyca
 * PyCrypto
@ -105,6 +106,7 @@
 * PyYAML
 * Requests
 * Scrypt
+* Scikit-Learn
 * SignXML
 * SQLAlchemy
 * ssl
--- a/rules/S6709/python/how-to-fix-it/numpy.adoc
+++ b/rules/S6709/python/how-to-fix-it/numpy.adoc
@ -0,0 +1,27 @@
+== How to fix it in Numpy
+
+To fix this issue, provide a predictable seed to the random number generator.
+
+=== Code examples
+
+==== Noncompliant code example
+
+[source,python,diff-id=1,diff-type=noncompliant]
+----
+import numpy as np
+
+def foo():
+    generator = np.random.default_rng()  # Noncompliant: no seed parameter is provided
+    x = generator.uniform()
+----
+
+==== Compliant solution
+
+[source,python,diff-id=1,diff-type=compliant]
+----
+import numpy as np
+
+def foo():
+    generator = np.random.default_rng(42)  # Compliant
+    x = generator.uniform()
+----
--- a/rules/S6709/python/how-to-fix-it/sklearn.adoc
+++ b/rules/S6709/python/how-to-fix-it/sklearn.adoc
@ -0,0 +1,29 @@
+== How to fix it in Scikit-Learn
+
+To fix this issue, provide a predictable seed to the estimator or the utility function.
+
+=== Code examples
+
+==== Noncompliant code example
+
+[source,python,diff-id=2,diff-type=noncompliant]
+----
+from sklearn.model_selection import train_test_split
+from sklearn.datasets import load_iris 
+
+X, y = load_iris(return_X_y=True)
+X_train, _, y_train, _ = train_test_split(X, y) # Noncompliant: no seed parameter is provided
+----
+
+==== Compliant solution
+
+[source,python,diff-id=2,diff-type=compliant]
+----
+from sklearn.model_selection import train_test_split
+from sklearn.datasets import load_iris 
+import numpy as np
+
+rng = np.random.default_rng(42)
+X, y = load_iris(return_X_y=True)
+X_train, _, y_train, _ = train_test_split(X, y, random_state=rng.integers(1)) # Compliant
+----
--- a/rules/S6709/python/rule.adoc
+++ b/rules/S6709/python/rule.adoc
@ -26,38 +26,20 @@ Note that a global seed for `RandomState` can be set using `numpy.random.seed` o

 In contexts that are not related to data science and machine learning, having a predictable seed may not be the desired behavior. Therefore, this rule will only raise issues if machine learning and data science libraries are being used.

-== How to fix it

-To fix this issue, provide a predictable seed to the random number generator.
+// How to fix it section

-=== Code examples
+include::how-to-fix-it/numpy.adoc[]

-==== Noncompliant code example
+include::how-to-fix-it/sklearn.adoc[]

-[source,python,diff-id=1,diff-type=noncompliant]
----
-import numpy as np
-
-def foo():
-    generator = np.random.default_rng()  # Noncompliant: no seed parameter is provided
-    x = generator.uniform()
----
-
-==== Compliant solution
-
-[source,python,diff-id=1,diff-type=compliant]
----
-import numpy as np
-
-def foo():
-    generator = np.random.default_rng(42)  # Compliant
-    x = generator.uniform()
----

 == Resources
 === Documentation

 * NumPy documentation - https://numpy.org/neps/nep-0019-rng-policy.html[NEP 19 RNG Policy]
+* Scikit-learn documentation - https://scikit-learn.org/stable/glossary.html#term-random_state[Glossary random_state]
+* Scikit-learn documentation - https://scikit-learn.org/stable/common_pitfalls.html#controlling-randomness[Controlling randomness]

 === Standards

@ -66,3 +48,4 @@ def foo():
 === Related rules

 * S6711 - `numpy.random.Generator` should be preferred to `numpy.random.RandomState`
+