How to Convert Pandas Dataframe to NumPy Array

Asad Riaz Feb 02, 2024
  1. to_numpy Method to Convert Pandas DataFrame to NumPy Array
  2. Values() Method to Convert Pandas DataFrame to NumPy Array
  3. To_records() Method to Convert DataFrame to NumPy Record Array
How to Convert Pandas Dataframe to NumPy Array

We will learn to_numpy() method to convert the pandas.Dataframe to NumPy array, introduced from pandas v0.24.0 replacing the deprecated .values method. We can define to_numpy on Index, Series, and DataFrame objects.

The deprecated DataFrame.values() method has inconsistent behavior; therefore, it is not recommended to use according to Pandas API documentation. However, we will look into an example of this method if you are using an older Pandas version.

We will also introduce another approach using DataFrame.to_records() method to convert the given DataFrame to a NumPy record array.

to_numpy Method to Convert Pandas DataFrame to NumPy Array

pandas.Dataframe is a 2d tabular data structure with rows and columns. This data structure can be converted into NumPy array by using the to_numpy method:

# python 3.x
import pandas as pd
import numpy as np

df = pd.DataFrame(data=np.random.randint(0, 10, (6, 4)), columns=["a", "b", "c", "d"])

nmp = df.to_numpy()

print(nmp)
print(type(nmp))

Output:

[[5 5 1 3]
 [1 6 6 0]
 [9 1 2 0]
 [9 3 5 3]
 [7 9 4 9]
 [8 1 8 9]]
<class 'numpy.ndarray'>

Pandas DataFrame to_numpy() method converts the DataFrame to a NumPy array as shown above.

Values() Method to Convert Pandas DataFrame to NumPy Array

We could also use the Dataframe.values() method as follows.

# python 3.x
import pandas as pd
import numpy as np

df = pd.DataFrame(data=np.random.randint(0, 10, (6, 4)), columns=["a", "b", "c", "d"])
nmp = df.values
print(nmp)
print(type(nmp))

Output:

[[8 8 5 0]
 [1 7 7 5]
 [0 2 4 2]
 [6 8 0 7]
 [6 4 5 1]
 [1 8 4 7]]
<class 'numpy.ndarray'>

If we want to include the index column in the converted NumPy array, we need to apply reset_index() with dataframe.values.

# python 3.x
import pandas as pd
import numpy as np

df = pd.DataFrame(data=np.random.randint(0, 10, (6, 4)), columns=["a", "b", "c", "d"])

nmp = df.reset_index().values
print(nmp)
print(type(nmp))

Output:

[[0 1 0 3 7]
 [1 8 2 5 1]
 [2 2 2 7 3]
 [3 3 4 3 7]
 [4 5 4 4 3]
 [5 2 9 7 6]]
<class 'numpy.ndarray'>

To_records() Method to Convert DataFrame to NumPy Record Array

If you need the dtypes, to_records()is the best option to use. Performance wise both to_numpy() and to_records() are almost same:

# python 3.x
import pandas as pd
import numpy as np

df = pd.DataFrame(data=np.random.randint(0, 10, (6, 4)), columns=["a", "b", "c", "d"])
nmp = df.to_records()
print(nmp)
print(type(nmp))

Output:

[(0, 0, 4, 6, 1) 
 (1, 3, 1, 7, 1) 
 (2, 9, 1, 6, 4) 
 (3, 1, 4, 6, 9)
 (4, 9, 1, 3, 9)
 (5, 2, 5, 7, 9)]
<class 'numpy.recarray'>

Related Article - Pandas DataFrame