55 lines
2.5 KiB
Plaintext
55 lines
2.5 KiB
Plaintext
This rule raises an issue when the ``++pandas.DataFrame.values++`` is used instead of the ``++pandas.DataFrame.to_numpy()++`` method.
|
|
|
|
== Why is this an issue?
|
|
|
|
The ``++values++`` attribute and the ``++to_numpy()++`` method in pandas both provide a way to return a NumPy representation of the ``++DataFrame++``. However, there are some reasons why the ``++to_numpy()++`` method is recommended over the ``++values++`` attribute:
|
|
|
|
* *Future Compatibility:*
|
|
The ``++values++`` attribute is considered a legacy feature, while the ``++to_numpy()++`` is the recommended method to extract data and is considered more future-proof.
|
|
* *Data type consistency:*
|
|
If the ``++DataFrame++`` has columns with different data types, NumPy will choose a common data type that can hold all the data. This may lead to loss of information, unexpected type conversions, or increased memory usage. The ``++to_numpy()++`` allows you to select the common type manually, passing the ``++dtype++`` argument.
|
|
* *View vs Copy:*
|
|
The ``++values++`` attribute can return a view or a copy of the data depending on whether the data needs to be transposed. This can lead to confusion when modifying the extracted data. On the other hand, ``++to_numpy()++`` has ``++copy++`` argument allowing to force it always to return a new NumPy array, ensuring that any changes you make won't affect the original ``++DataFrame++``.
|
|
* *Missing values control:*
|
|
The ``++to_numpy()++`` allows to specify the default value used for missing values in the ``++DataFrame++``, while the ``++values++`` will always use ``++numpy.nan++`` for missing values.
|
|
|
|
== How to fix it
|
|
Use the ``++to_numpy()++`` method instead of the ``++values++`` attribute to get a NumPy representation of the ``++DataFrame++``.
|
|
|
|
=== Code examples
|
|
|
|
==== Noncompliant code example
|
|
|
|
[source,python,diff-id=1,diff-type=noncompliant]
|
|
----
|
|
import pandas as pd
|
|
|
|
df = pd.DataFrame({
|
|
'X': ['A', 'B', 'A', 'C'],
|
|
'Y': [10, 7, 12, 5]
|
|
})
|
|
|
|
arr = df.values # Noncompliant: using the 'values' attribute is not recommended
|
|
----
|
|
|
|
==== Compliant solution
|
|
|
|
[source,python,diff-id=1,diff-type=compliant]
|
|
----
|
|
import pandas as pd
|
|
|
|
df = pd.DataFrame({
|
|
'X': ['A', 'B', 'A', 'C'],
|
|
'Y': [10, 7, 12, 5]
|
|
})
|
|
|
|
arr = df.to_numpy() # Compliant
|
|
----
|
|
|
|
|
|
== Resources
|
|
=== Documentation
|
|
|
|
* Pandas Documentation - https://pandas.pydata.org/pandas-docs/version/2.1/reference/api/pandas.DataFrame.to_numpy.html[pandas.DataFrame.to_numpy()]
|
|
* Pandas Documentation - https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.values.html[pandas.DataFrame.values]
|