Pandas DataFrame DataFrame.interpolate() Function

  1. Syntax of pandas.DataFrame.interpolate():
  2. Example Codes: Interpolate All NaN Values in DataFrame With DataFrame.interpolate() Method
  3. Example Codes: DataFrame.interpolate() Method With the method Parameter
  4. Example Codes: Pandas DataFrame.interpolate() Method With the axis Parameter to Interpolate Along row Axis
  5. Example Codes: DataFrame.interpolate() Method With limit Parameter
  6. Example Codes: DataFrame.interpolate() Method With limit_direction Parameter
  7. Interpolate Time-Series Data With DataFrame.interpolate() Method

The Python Pandas DataFrame.interpolate() function fills NaN values in the DataFrame using interpolation technique.

Syntax of pandas.DataFrame.interpolate():

DataFrame.interpolate(method='linear', 
                      axis=0, 
                      limit=None, 
                      inplace=False,                                                 limit_direction='forward', 
                      limit_area=None, 
                      downcast=None,
                      **kwargs)

Parameters

method linear, time, index, values, nearest, zero, slinear, quadratic, cubic, barycentric, krogh, polynomial, spline, piecewise_polynomial, from_derivatives, pchip, akima or None. Method used for interpolating NaN values.
axis Interpolate missing values along the row (axis=0) or column (axis=1)
limit Integer. maximum number of consecutive NaNs to be interpolated.
inplace Boolean. If True, modify the caller DataFrame in-place
limit_direction forward, backward or both. Direction along NaNs are interpolated when the limit is specified
limit_area None, inside, or outside. Restriction for interpolating when the limit is specified
downcast Dictionary. Specifies downcast of datatypes
**kwargs Keyword arguments for the interpolating function.

Return

If inplace is True, a DataFrame interpolating all the NaN values using given method; otherwise None.

Example Codes: Interpolate All NaN Values in DataFrame With DataFrame.interpolate() Method

import pandas as pd

df = pd.DataFrame({'X': [1, 2, 3, None, 3],
                   'Y': [4, None, 8, None, 3]})
print("DataFrame:")
print(df)

filled_df = df.interpolate()

print("Interploated DataFrame:")
print(filled_df)

Output:

DataFrame:
     X    Y
0  1.0  4.0
1  2.0  NaN
2  3.0  8.0
3  NaN  NaN
4  3.0  3.0
Interploated DataFrame:
     X    Y
0  1.0  4.0
1  2.0  6.0
2  3.0  8.0
3  3.0  5.5
4  3.0  3.0

It interpolates all the NaN values in DataFrame using the linear interpolation method.

This method is more intelligent compared to pandas.DataFrame.fillna(), which uses a fixed value to replace all the NaN values in the DataFrame.

Example Codes: DataFrame.interpolate() Method With the method Parameter

We can also interpolate NaN values in DataFrame with different interpolation techniques setting values of method parameter in DataFrame.interpolate() function.

import pandas as pd

df = pd.DataFrame({'X': [1, 2, 3, None, 3],
                   'Y': [4, None, 8, None, 3]})
print("DataFrame:")
print(df)

filled_df = df.interpolate(method='polynomial', order=2)

print("Interploated DataFrame:")
print(filled_df)

Output:

DataFrame:
     X    Y
0  1.0  4.0
1  2.0  NaN
2  3.0  8.0
3  NaN  NaN
4  3.0  3.0
Interploated DataFrame:
          X      Y
0  1.000000  4.000
1  2.000000  7.125
2  3.000000  8.000
3  3.368421  6.625
4  3.000000  3.000

This method interpolates all the NaN values in the DataFrame using the polynomial interpolation method of 2nd order.

Here, order=2 is the keyword argument for the polynomial function.

Example Codes: Pandas DataFrame.interpolate() Method With the axis Parameter to Interpolate Along row Axis

import pandas as pd

df = pd.DataFrame({'X': [1, 2, 3, None, 3],
                   'Y': [4, None, 8, None, 3]})
print("DataFrame:")
print(df)

filled_df = df.interpolate(axis=1)

print("Interploated DataFrame:")
print(filled_df)

Output:

DataFrame:
     X    Y
0  1.0  4.0
1  2.0  NaN
2  3.0  8.0
3  NaN  NaN
4  3.0  3.0
Interploated DataFrame:
     X    Y
0  1.0  4.0
1  2.0  2.0
2  3.0  8.0
3  NaN  NaN
4  3.0  3.0

Here, we set axis=1 to interpolate the NaN values along the row axis. In the 2nd row, NaN value is replaced using linear interpolation along the 2nd row.

However, in the 4th row, the NaN values remain even after interpolation, as both the values in the 4th row are NaN.

Example Codes: DataFrame.interpolate() Method With limit Parameter

The limit parameter in DataFrame.interpolate() method restricts the maximum number of consecutive NaN values to be filled by the method.

import pandas as pd

df = pd.DataFrame({'X': [1, 2, 3, None, 3],
                   'Y': [4, None, None, None, 3]})
print("DataFrame:")
print(df)

filled_df = df.interpolate( limit = 1)

print("Interploated DataFrame:")
print(filled_df)

Output:

DataFrame:
     X    Y
0  1.0  4.0
1  2.0  NaN
2  3.0  NaN
3  NaN  NaN
4  3.0  3.0
Interploated DataFrame:
     X     Y
0  1.0  4.00
1  2.0  3.75
2  3.0   NaN
3  3.0   NaN
4  3.0  3.00

Here, once a NaN is filled in a column from the top, the next consecutive NaN values in the same column remain unchanged.

Example Codes: DataFrame.interpolate() Method With limit_direction Parameter

The limit-direction parameter in DataFrame.interpolate() method controls the direction along a particular axis, in which values are interpolated.

import pandas as pd

df = pd.DataFrame({'X': [1, 2, 3, None, 3],
                   'Y': [4, None, None, None, 3]})
print("DataFrame:")
print(df)

filled_df = df.interpolate(limit_direction ='backward', limit = 1)

print("Interploated DataFrame:")
print(filled_df)

Output:

DataFrame:
     X    Y
0  1.0  4.0
1  2.0  NaN
2  3.0  NaN
3  NaN  NaN
4  3.0  3.0
Interploated DataFrame:
     X     Y
0  1.0  4.00
1  2.0   NaN
2  3.0   NaN
3  3.0  3.25
4  3.0  3.00

Here, once a NaN is filled in a column from the bottom, the next consecutive NaN values in the same column remain unchanged.

Interpolate Time-Series Data With DataFrame.interpolate() Method

import pandas as pd

dates=['April-10', 'April-11', 'April-12', 'April-13']
fruits=['Apple', 'Papaya', 'Banana', 'Mango']
prices=[3, None, 2, 4]

df = pd.DataFrame({'Date':dates ,
                   'Fruit':fruits ,
                   'Price': prices})

print(df)
df.interpolate(inplace=True)

print("Interploated DataFrame:")
print(df)

Output:

       Date   Fruit  Price
0  April-10   Apple    3.0
1  April-11  Papaya    NaN
2  April-12  Banana    2.0
3  April-13   Mango    4.0
Interploated DataFrame:
       Date   Fruit  Price
0  April-10   Apple    3.0
1  April-11  Papaya    2.5
2  April-12  Banana    2.0
3  April-13   Mango    4.0

Due to inplace=True, the original DataFrame is modified after calling interpolate() function.

comments powered by Disqus