How to Fix Memory Error in Pandas

Preet Sanghavi Feb 02, 2024
  1. What is the Memory Error in Pandas
  2. How to Avoid the Memory Error in Pandas
How to Fix Memory Error in Pandas

This tutorial explores the concept of memory error in Pandas.

What is the Memory Error in Pandas

While working with Pandas, an analyst might encounter multiple errors that the code interpreter throws. These errors are widely ranged and would help us better investigate the issue.

In this tutorial, we aim to better understand the memory error thrown by Pandas, the reason behind it throwing that error and the potential ways by which this error can be resolved.

Firstly, let us understand what this error means. A memory error means that there is not enough memory on the server or the database you’re trying to access to complete the operation or task you wish to perform.

This error is generally associated with files and CSV data that have the order of hundreds of gigabytes. It is important to understand what causes this error and avoid such an error to have more data storage.

Solving this error can also help develop an efficient and thorough database with proper rule management.

Assuming we’re trying to fetch data from a CSV file with more than 1000 gigabytes of data, we will naturally face the memory error discussed above. This error can be illustrated below.

MemoryError
Press any key to continue . . .

There is a method by which we can avoid this memory error potentially. However, before we do that, let us create a dummy data frame to work with.

We will call this data frame dat1. Let us create this data frame using the following code.

import pandas as pd

dat1 = pd.DataFrame(pd.np.random.choice(["1.0", "0.6666667", "150000.1"], (100000, 10)))

The query creates 10 columns indexed from 0 to 9 and 100000 values. To view the entries in the data, we use the following code.

print(dat1)

The above code gives the following output.

               0          1          2  ...          7          8          9
0            1.0        1.0        1.0  ...   150000.1  0.6666667  0.6666667
1      0.6666667  0.6666667        1.0  ...  0.6666667   150000.1  0.6666667
2            1.0        1.0   150000.1  ...   150000.1        1.0   150000.1
3       150000.1  0.6666667  0.6666667  ...        1.0   150000.1        1.0
4       150000.1  0.6666667   150000.1  ...   150000.1  0.6666667  0.6666667
...          ...        ...        ...  ...        ...        ...        ...
99995   150000.1   150000.1        1.0  ...   150000.1        1.0  0.6666667
99996        1.0        1.0   150000.1  ...  0.6666667  0.6666667   150000.1
99997   150000.1   150000.1        1.0  ...  0.6666667   150000.1  0.6666667
99998        1.0  0.6666667  0.6666667  ...  0.6666667        1.0   150000.1
99999        1.0  0.6666667   150000.1  ...        1.0   150000.1        1.0

[100000 rows x 10 columns]

How to Avoid the Memory Error in Pandas

Now let us see the total space that this data frame occupies using the following code.

resource.getrusage(resource.RUSAGE_SELF).ru_maxrss

The code gives the following output.

# 224544 (~224 MB)

To avoid spending so much space on just a single data frame, let us do this by specifying exactly the data type we’re dealing with.

This helps us reduce the total memory required as lesser space is required to understand the type of data, and more space can be allotted to the actual data under consideration.

We can do this using the following query.

df = pd.DataFrame(pd.np.random.choice([1.0, 0.6666667, 150000.1], (100000, 10)))
resource.getrusage(resource.RUSAGE_SELF).ru_maxrss

The output of the code is below.

# 79560 (~79 MB)

Since we have specified the data type as int here by not assigning strings, we have successfully reduced the memory space required for our data.

Thus, we have learned the meaning, cause, and potential solution with this tutorial regarding the memory error thrown in Pandas.

Preet Sanghavi avatar Preet Sanghavi avatar

Preet writes his thoughts about programming in a simplified manner to help others learn better. With thorough research, his articles offer descriptive and easy to understand solutions.

LinkedIn GitHub

Related Article - Pandas Error