How to Convert NumPy Array to Pandas DataFrame

Suraj Joshi Feb 02, 2024
How to Convert NumPy Array to Pandas DataFrame

This tutorial explains how to convert a numpy array to a Pandas DataFrame using the pandas.DataFrame() method.

We pass the numpy array into the pandas.DataFrame() method to generate Pandas DataFrames from NumPy arrays. We can also specify column names and row indices for the DataFrame.

Convert NumPy Array to Pandas DataFrame Using the pandas.DataFrame() Method

We pass the NumPy array into the pandas.DataFrame() method to generate the DataFrame from the NumPy array.

from numpy import random
import pandas as pd

random.seed(5)
random.randint(100, size=(3, 5))
data_array = random.randint(100, size=(4, 3))

print("NumPy Data Array is:")
print(data_array)

print("")

data_df = pd.DataFrame(data_array)
print("The DataFrame generated from the NumPy array is:")
print(data_df)

Output:

NumPy Data Array is:
[[27 44 77]
 [75 65 47]
 [30 84 86]
 [18  9 41]]

The DataFrame generated from the NumPy array is:
    0   1   2
0  27  44  77
1  75  65  47
2  30  84  86
3  18   9  41

It first creates a random array of size (4,3) with 4 rows and 3 columns. We then pass the array as an argument to the pandas.DataFrame() method, which generates DataFrame named data_df out of the array. By default, the pandas.DataFrame() method will insert default column names and row indices.

We can also set the column names and row indices using the index and columns parameter of the pandas.DataFrame() method.

from numpy import random
import pandas as pd

random.seed(5)
random.randint(100, size=(3, 5))
data_array = random.randint(100, size=(4, 3))
row_indices = ["Row_1", "Row_2", "Row_3", "Row_4"]
column_names = ["Column_1", "Column_2", "Column_3"]

print("NumPy Data Array is:")
print(data_array)

print("")

data_df = pd.DataFrame(data_array, index=row_indices, columns=column_names)
print("The DataFrame generated from the NumPy array is:")
print(data_df)

Output:

NumPy Data Array is:
[[27 44 77]
 [75 65 47]
 [30 84 86]
 [18  9 41]]

The DataFrame generated from the NumPy array is:
       Column_1  Column_2  Column_3
Row_1        27        44        77
Row_2        75        65        47
Row_3        30        84        86
Row_4        18         9        41

Here, we set the value of index to row_indices, a list containing each row’s indices. Similarly, we assign column names by setting the value of columns to the list column_names, which contains each column’s name.

In some cases, the NumPy array itself may contain row indices and column names. Then we use array slicing to extract the data, row indices, and column names from the array.

import numpy as np
import pandas as pd

marks_array = np.array(
    [["", "Mathematics", "Economics"], ["Sunny", 25, 23], ["Alice", 23, 24]]
)

print("NumPy Data Array is:")
print(marks_array)

print("")

row_indices = marks_array[1:, 0]
column_names = marks_array[0, 1:]
data_df = pd.DataFrame(
    data=np.int_(marks_array[1:, 1:]), index=row_indices, columns=column_names
)

print("The DataFrame generated from the NumPy array is:")
print(data_df)

Output:

NumPy Data Array is:
[['' 'Mathematics' 'Economics']
 ['Sunny' '25' '23']
 ['Alice' '23' '24']]

The DataFrame generated from the NumPy array is:
       Mathematics  Economics
Sunny           25         23
Alice           23         24

We have row indices and column names in the NumPy array itself. We select all the values after the first row and first column and provide it as a data argument to the pandas.DataFrame() function, and select all the first column values from the second row and pass it as an index argument. Similarly, we select all the first row values from the second column and pass it as columns argument to set the column names.

The numpy.array() will convert the integer values into string values while making NumPy array to ensure the array’s same data format. We use the numpy.int_() function to convert the data values back to the integer type.

Author: Suraj Joshi
Suraj Joshi avatar Suraj Joshi avatar

Suraj Joshi is a backend software engineer at Matrice.ai.

LinkedIn

Related Article - Pandas DataFrame