Fix after review

2025-01-31 14:26:07 +01:00 · 2025-01-31 14:26:07 +01:00 · acb69ec40d
commit acb69ec40d
parent 17a753a84e
1 changed files with 3 additions and 2 deletions
--- a/rules/S7193/python/rule.adoc
+++ b/rules/S7193/python/rule.adoc
@ -8,7 +8,7 @@ PySpark is designed to handle large-scale data processing in a distributed manne

 For this reason, it is generally advisable to avoid using `toPandas` unless you are certain that the dataset is small enough to be handled comfortably by a single machine. Instead, consider using Spark's built-in functions and capabilities to perform data processing tasks in a distributed manner.

-If conversion to Pandas is necessary, ensure that the dataset size is manageable and that the conversion is justified by specific requirements, such as integration with libraries that require Pandas DataFrames.
+If the conversion to Pandas is necessary, ensure that the dataset size is manageable and that the conversion is justified by specific requirements, such as integration with libraries that require Pandas DataFrames.

 === Exceptions

@ -31,7 +31,8 @@ To fix this issue, consider using PySpark built-in capabilities without relying
 # Converting a PySpark DataFrame to a Pandas DataFrame
 df = spark.read.csv("my_data.csv")
 pandas_df = df.toPandas()  # Noncompliant: May cause memory issues with large datasets
-print(pandas_df)
+filtered_df = df[df['id'] > 1]
+print(filtered_df)
 ----

 ==== Compliant solution