rspec/rules/S6740/python/rule.adoc
github-actions[bot] 7153f91dd5
Create rule S6740: dtype parameter should be provided when using pandas.read_csv or pandas.read_table (#2989)
You can preview this rule
[here](https://sonarsource.github.io/rspec/#/rspec/S6740/python)
(updated a few minutes after each push).

## Review

A dedicated reviewer checked the rule description successfully for:

- [ ] logical errors and incorrect information
- [ ] information gaps and missing content
- [ ] text style and tone
- [ ] PR summary and labels follow [the
guidelines](https://github.com/SonarSource/rspec/#to-modify-an-existing-rule)

---------

Co-authored-by: joke1196 <joke1196@users.noreply.github.com>
Co-authored-by: David Kunzmann <david.kunzmann@sonarsource.com>
Co-authored-by: Guillaume Dequenne <guillaume.dequenne@sonarsource.com>
2023-10-06 11:50:49 +02:00

62 lines
2.0 KiB
Plaintext

This rule raises an error when the ``++dtype++`` parameter is not provided when using ``++pandas.read_csv++`` or ``++pandas.read_table++``.
== Why is this an issue?
The pandas library provides an easy way to load data from documents hosted locally or remotely, for example with the ``++pandas.read_csv++`` or ``++pandas.read_table++`` functions:
[source,python]
----
import pandas as pd
df = pd.read_csv("my_file.csv")
----
Pandas will infer the type of each columns of the CSV file and specify the datatype accordingly, making this code perfectly valid.
However this snippet of code does not convey the proper intent of the user, and can raise questions such as:
* What information can I access in ``++df++``?
* What are the names of the columns available in ``++df++``?
These questions arise as there are no descriptions of what kind of data is loaded into the data frame, making the code less understandable and harder to maintain.
A straightforward way to fix these issues is by providing the schema of the data through the usage of the ``++dtype++`` parameter.
== How to fix it
To fix this issue provide the ``++dtype++`` parameter to the ``++read_csv++`` or ``++read_table++`` function.
=== Code examples
==== Noncompliant code example
[source,python,diff-id=1,diff-type=noncompliant]
----
import pandas as pd
def foo():
return pd.read_csv("my_file.csv") # Noncompliant: it is unclear which type of data the data frame holds.
----
==== Compliant solution
[source,python,diff-id=1,diff-type=compliant]
----
import pandas as pd
def foo():
return pd.read_csv(
"my_file.csv",
dtype={'name': 'str', 'age': 'int'}) # Compliant
----
== Resources
=== Documentation
* Pandas Documentation - https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html#pandas-read-csv[pandas.read_csv]
* Pandas Documentation - https://pandas.pydata.org/docs/reference/api/pandas.read_table.html#pandas-read-table[pandas.read_table]
* Pandas Documentation - https://pandas.pydata.org/docs/user_guide/basics.html#dtypes[dtypes]