Pandas 填充 NaN 值

Suraj Joshi 2023年1月30日
  1. DataFrame.fillna() 方法
  2. 使用 DataFrame.fillna() 方法用指定的值填充整個 DataFrame
  3. 用指定的值填充指定列的 NaN
Pandas 填充 NaN 值

本教程解釋了我們如何使用 DataFrame.fillna() 方法用指定的值填充 NaN 值。

我們將在本文中使用下面的 DataFrame。

import numpy as np
import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Roll No": [501, 502, np.nan, 504, 505, 506],
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Income(in $)": [200, 400, np.nan, 30, np.nan, np.nan],
        "Age": [17, 18, np.nan, 16, 18, np.nan],
    }
)

print(student_df)

輸出:

   Roll No      Name  Income(in $)   Age
0    501.0  Jennifer         200.0  17.0
1    502.0    Travis         400.0  18.0
2      NaN       Bob           NaN   NaN
3    504.0      Emma          30.0  16.0
4    505.0      Luna           NaN  18.0
5    506.0     Anish           NaN   NaN

DataFrame.fillna() 方法

語法

DataFrame.fillna(
    value=None, method=None, axis=None, inplace=False, limit=None, downcast=None
)

DataFrame.fillna() 方法使我們能夠用指定的值或方法來填充 DataFrame 中的 NaN 值。

使用 DataFrame.fillna() 方法用指定的值填充整個 DataFrame

import numpy as np
import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Roll No": [501, 502, np.nan, 504, 505, 506],
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Income(in $)": [200, 400, np.nan, 30, np.nan, np.nan],
        "Age": [17, 18, np.nan, 16, 18, np.nan],
    }
)
filled_df = student_df.fillna(0)

print("DataFrame with NaN values")
print(student_df, "\n")

print("After applying fillna() to the DataFrame:")
print(filled_df, "\n")

輸出:

DataFrame with NaN values
   Roll No      Name  Income(in $)   Age
0    501.0  Jennifer         200.0  17.0
1    502.0    Travis         400.0  18.0
2      NaN       Bob           NaN   NaN
3    504.0      Emma          30.0  16.0
4    505.0      Luna           NaN  18.0
5    506.0     Anish           NaN   NaN 

After applying fillna() to the DataFrame:
   Roll No      Name  Income(in $)   Age
0    501.0  Jennifer         200.0  17.0
1    502.0    Travis         400.0  18.0
2      0.0       Bob           0.0   0.0
3    504.0      Emma          30.0  16.0
4    505.0      Luna           0.0  18.0
5    506.0     Anish           0.0   0.0 

它將 DataFrame student_df 中的所有 NaN 值替換為 0,該值作為引數傳遞給 DataFrame.fillna() 方法。

import numpy as np
import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Roll No": [501, 502, np.nan, 504, 505, 506],
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Income(in $)": [200, 400, np.nan, 30, np.nan, np.nan],
        "Age": [17, 18, np.nan, 16, 18, np.nan],
    }
)
filled_df = student_df.fillna(method="ffill")

print("DataFrame with NaN values")
print(student_df, "\n")

print("After applying fillna() to the DataFrame:")
print(filled_df, "\n")

輸出:

DataFrame with NaN values
   Roll No      Name  Income(in $)   Age
0    501.0  Jennifer         200.0  17.0
1    502.0    Travis         400.0  18.0
2      NaN       Bob           NaN   NaN
3    504.0      Emma          30.0  16.0
4    505.0      Luna           NaN  18.0
5    506.0     Anish           NaN   NaN 

After applying fillna() to the DataFrame:
   Roll No      Name  Income(in $)   Age
0    501.0  Jennifer         200.0  17.0
1    502.0    Travis         400.0  18.0
2    502.0       Bob         400.0  18.0
3    504.0      Emma          30.0  16.0
4    505.0      Luna          30.0  18.0
5    506.0     Anish          30.0  18.0 

它將所有 student_df 中的 NaN 值填入與 NaN 值相同列的 NaN 值之前的值。

用指定的值填充指定列的 NaN

為了用指定的值來填充特定的值,我們向 fillna() 方法傳遞一個字典,以列名作為鍵,以該列的 NaN 值作為值。

import numpy as np
import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Roll No": [501, 502, np.nan, 504, 505, 506],
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Income(in $)": [200, 400, np.nan, 300, np.nan, np.nan],
        "Age": [17, 18, np.nan, 16, 18, np.nan],
    }
)
filled_df = student_df.fillna({"Age": 17, "Income(in $)": 300})

print("DataFrame with NaN values")
print(student_df, "\n")

print("After applying fillna() to the DataFrame:")
print(filled_df, "\n")

輸出:

DataFrame with NaN values
   Roll No      Name  Income(in $)   Age
0    501.0  Jennifer         200.0  17.0
1    502.0    Travis         400.0  18.0
2      NaN       Bob           NaN   NaN
3    504.0      Emma         300.0  16.0
4    505.0      Luna           NaN  18.0
5    506.0     Anish           NaN   NaN 

After applying fillna() to the DataFrame:
   Roll No      Name  Income(in $)   Age
0    501.0  Jennifer         200.0  17.0
1    502.0    Travis         400.0  18.0
2      NaN       Bob         300.0  17.0
3    504.0      Emma         300.0  16.0
4    505.0      Luna         300.0  18.0
5    506.0     Anish         300.0  17.0 

它將 Age 列中的所有 NaN 值填充為 17,將 Income(in $) 列中的所有 NaN 值填充為 300。Roll No 欄中的 NaN 值保持不變。

作者: Suraj Joshi
Suraj Joshi avatar Suraj Joshi avatar

Suraj Joshi is a backend software engineer at Matrice.ai.

LinkedIn

相關文章 - Pandas NaN