How to convert Pandas Dataframe to Numpy array

  1. to_numpy method to convert dataframeto numpy array
  2. to_records() method to convert dataframeto numpy record array

We will look at to_numpy() method to convert the pandas.Dataframe to numpy array, introduced from pandas v0.24.0 replacing old .values method. We can define to_numpy on Index, Series, and DataFrame objects.

Old DataFrame.values has inconsistent behaviour, not recommended to use according to pandas API documentation. However, we will look into an example of this method in case you are using an older version.

Another old method DataFrame.as_matrix() is deprecated now, do not use it!

We will also introduce another approach using DataFrame.to_records() method to convert the given dataframe to a numpy record array.

to_numpy method to convert dataframeto numpy array

pandas.Dataframe is a 2d tabular data structure with rows and columns. This data structure can be converted into numpy array by using to_numpy method:

# python 3.x
import pandas as pd
import numpy as np
df = pd.DataFrame(
    data=np.random.randint (
        0, 10, (6,4)),
    columns=["a", "b", "c", "d"])
nmp=df.to_numpy()
print(nmp) 
print(type(nmp))

Output:

[[5 5 1 3]
 [1 6 6 0]
 [9 1 2 0]
 [9 3 5 3]
 [7 9 4 9]
 [8 1 8 9]]
<class 'numpy.ndarray'>

The Same can be done by using Dataframe.values method as follows:

# python 3.x
import pandas as pd
import numpy as np
df = pd.DataFrame(
    data=np.random.randint(
        0, 10, (6,4)),
    columns=["a", "b", "c", "d"])
nmp=df.values
print(nmp) 
print(type(nmp))

Output:

[[8 8 5 0]
 [1 7 7 5]
 [0 2 4 2]
 [6 8 0 7]
 [6 4 5 1]
 [1 8 4 7]]
<class 'numpy.ndarray'>

If we want to include indexes in numpy array we need to apply reset_index() with dataframe.values:

# python 3.x
import pandas as pd
import numpy as np
df = pd.DataFrame(
    data=np.random.randint(
        0, 10, (6,4)),
    columns=["a", "b", "c", "d"])
nmp=df.reset_index().values
print(nmp) 
print(type(nmp))

Output:

[[0 1 0 3 7]
 [1 8 2 5 1]
 [2 2 2 7 3]
 [3 3 4 3 7]
 [4 5 4 4 3]
 [5 2 9 7 6]]
<class 'numpy.ndarray'>

to_records() method to convert dataframeto numpy record array

If you need the dtypes, to_records()is the best option to use. Performance wise both to_numpy()and to_records() are almost same:

# python 3.x
import pandas as pd
import numpy as np
df = pd.DataFrame(
    data=np.random.randint(
        0, 10, (6,4)),
    columns=["a", "b", "c", "d"])
nmp=df.to_records()
print(nmp) 
print(type(nmp))

Output:

[(0, 0, 4, 6, 1) 
 (1, 3, 1, 7, 1) 
 (2, 9, 1, 6, 4) 
 (3, 1, 4, 6, 9)
 (4, 9, 1, 3, 9)
 (5, 2, 5, 7, 9)]
<class 'numpy.recarray'>

Related Article - Pandas DataFrame

  • How to Get Pandas DataFrame Column Headers as a List
  • How to Delete Pandas DataFrame Column
  • How to Convert DataFrame Column to Datetime in Pandas
  • How to Convert a float to an integer in Pandas DataFrame
  • How to Sort Pandas DataFrame by One Column's Values
  • How to get the aggregate of Pandas group-by and Sum
  • How to convert Python dictionary to Pandas DataFrame
  • How to add header row to a pandas DataFrame
  • How to convert index of a Pandas Dataframe into a column
  • How to count the NaN occurrences in a column in Pandas Dataframe
  • How to change the order of Pandas DataFrame columns
  • How to add one row to Pandas DataFrame
  • How to delete a row based on column value in Pandas DataFrame
  • How to get a value from a cell of a Pandas DataFrame
  • comments powered by Disqus