rspec/rules/S6741/python/rule.adoc

This rule raises an issue when the ``++pandas.DataFrame.values++`` is used instead of the ``++pandas.DataFrame.to_numpy()++`` method.

== Why is this an issue?

The ``++values++`` attribute and the ``++to_numpy()++`` method in pandas both provide a way to return a NumPy representation of the ``++DataFrame++``. However, there are some reasons why the ``++to_numpy()++`` method is recommended over the ``++values++`` attribute:

* *Future Compatibility:*
The ``++values++`` attribute is considered a legacy feature, while the ``++to_numpy()++`` is the recommended method to extract data and is considered more future-proof.
* *Data type consistency:*
If the ``++DataFrame++`` has columns with different data types, NumPy will choose a common data type that can hold all the data. This may lead to loss of information, unexpected type conversions, or increased memory usage. The ``++to_numpy()++`` allows you to select the common type manually, passing the ``++dtype++`` argument.
* *View vs Copy:*
The ``++values++`` attribute can return a view or a copy of the data depending on whether the data needs to be transposed. This can lead to confusion when modifying the extracted data. On the other hand, ``++to_numpy()++`` has ``++copy++`` argument allowing to force it always to return a new NumPy array, ensuring that any changes you make won't affect the original ``++DataFrame++``.
* *Missing values control:*
The ``++to_numpy()++`` allows to specify the default value used for missing values in the ``++DataFrame++``, while the ``++values++`` will always use ``++numpy.nan++`` for missing values.

== How to fix it
Use the ``++to_numpy()++`` method instead of the ``++values++`` attribute to get a NumPy representation of the ``++DataFrame++``.

=== Code examples

==== Noncompliant code example

[source,python,diff-id=1,diff-type=noncompliant]
----
import pandas as pd

df = pd.DataFrame({
        'X': ['A', 'B', 'A', 'C'],
        'Y': [10, 7, 12, 5]
    })

arr = df.values # Noncompliant: using the 'values' attribute is not recommended
----

==== Compliant solution

[source,python,diff-id=1,diff-type=compliant]
----
import pandas as pd

df = pd.DataFrame({
        'X': ['A', 'B', 'A', 'C'],
        'Y': [10, 7, 12, 5]
    })

arr = df.to_numpy() # Compliant
----


== Resources
=== Documentation

* Pandas Documentation - https://pandas.pydata.org/pandas-docs/version/2.1/reference/api/pandas.DataFrame.to_numpy.html[pandas.DataFrame.to_numpy()]
* Pandas Documentation - https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.values.html[pandas.DataFrame.values]