![github-actions[bot]](/assets/img/avatar_default.png)
You can preview this rule [here](https://sonarsource.github.io/rspec/#/rspec/S6740/python) (updated a few minutes after each push). ## Review A dedicated reviewer checked the rule description successfully for: - [ ] logical errors and incorrect information - [ ] information gaps and missing content - [ ] text style and tone - [ ] PR summary and labels follow [the guidelines](https://github.com/SonarSource/rspec/#to-modify-an-existing-rule) --------- Co-authored-by: joke1196 <joke1196@users.noreply.github.com> Co-authored-by: David Kunzmann <david.kunzmann@sonarsource.com> Co-authored-by: Guillaume Dequenne <guillaume.dequenne@sonarsource.com>
62 lines
2.0 KiB
Plaintext
62 lines
2.0 KiB
Plaintext
This rule raises an error when the ``++dtype++`` parameter is not provided when using ``++pandas.read_csv++`` or ``++pandas.read_table++``.
|
|
|
|
== Why is this an issue?
|
|
|
|
The pandas library provides an easy way to load data from documents hosted locally or remotely, for example with the ``++pandas.read_csv++`` or ``++pandas.read_table++`` functions:
|
|
|
|
|
|
[source,python]
|
|
----
|
|
import pandas as pd
|
|
|
|
df = pd.read_csv("my_file.csv")
|
|
----
|
|
|
|
Pandas will infer the type of each columns of the CSV file and specify the datatype accordingly, making this code perfectly valid.
|
|
However this snippet of code does not convey the proper intent of the user, and can raise questions such as:
|
|
|
|
* What information can I access in ``++df++``?
|
|
* What are the names of the columns available in ``++df++``?
|
|
|
|
These questions arise as there are no descriptions of what kind of data is loaded into the data frame, making the code less understandable and harder to maintain.
|
|
|
|
A straightforward way to fix these issues is by providing the schema of the data through the usage of the ``++dtype++`` parameter.
|
|
|
|
|
|
== How to fix it
|
|
|
|
To fix this issue provide the ``++dtype++`` parameter to the ``++read_csv++`` or ``++read_table++`` function.
|
|
|
|
=== Code examples
|
|
|
|
==== Noncompliant code example
|
|
|
|
[source,python,diff-id=1,diff-type=noncompliant]
|
|
----
|
|
import pandas as pd
|
|
|
|
def foo():
|
|
return pd.read_csv("my_file.csv") # Noncompliant: it is unclear which type of data the data frame holds.
|
|
----
|
|
|
|
==== Compliant solution
|
|
|
|
[source,python,diff-id=1,diff-type=compliant]
|
|
----
|
|
import pandas as pd
|
|
|
|
def foo():
|
|
return pd.read_csv(
|
|
"my_file.csv",
|
|
dtype={'name': 'str', 'age': 'int'}) # Compliant
|
|
----
|
|
|
|
|
|
== Resources
|
|
|
|
=== Documentation
|
|
|
|
* Pandas Documentation - https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html#pandas-read-csv[pandas.read_csv]
|
|
* Pandas Documentation - https://pandas.pydata.org/docs/reference/api/pandas.read_table.html#pandas-read-table[pandas.read_table]
|
|
* Pandas Documentation - https://pandas.pydata.org/docs/user_guide/basics.html#dtypes[dtypes]
|