如何基於 Pandas 中的給定條件建立 DataFrame 列

Suraj Joshi 2023年1月30日 Pandas Pandas DataFrame Column Pandas Condition

列表推導以根據 Pandas 中的給定條件建立新的 DataFrame 列
NumPy 方法根據 Pandas 中的給定條件建立新的 DataFrame 列
pandas.DataFrame.apply 根據 Pandas 中的給定條件建立新的 DataFrame 列
pandas.Series.map() 根據 Pandas 中的給定條件建立新的 DataFrame 列

我們可以使用 DataFrame 物件的列表推導，NumPy 方法，apply() 方法和 map() 方法根據 Pandas 中的給定條件建立 DataFrame 列。

列表推導以根據 Pandas 中的給定條件建立新的 `DataFrame` 列

我們可以根據 Pandas 中的給定條件，利用各種列表推導來建立新的 DataFrame 列。列表推導是一種從可迭代物件建立新列表的方法。它比其他方法更快，更簡單。

import pandas as pd
import numpy as np

list_of_dates = [
    "2019-11-20",
    "2020-01-02",
    "2020-02-05",
    "2020-03-10",
    "2020-04-16",
    "2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
    {"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)
df["Status"] = ["Senior" if s >= 400 else "Junior" for s in df["Salary"]]
print(df)

輸出：

      Name Joined date  Salary  Status
0   Hisila  2019-11-20     200  Junior
1  Shristi  2020-01-02     400  Senior
2    Zeppy  2020-02-05     300  Junior
3    Alina  2020-03-10     500  Senior
4    Jerry  2020-04-16     600  Senior
5    Kevin  2020-05-01     300  Junior

如果 Salary 大於或等於 400，它將在 df 中建立一個新列 Status，其值將為 Senior，否則為 Junior。

NumPy 方法根據 Pandas 中的給定條件建立新的 DataFrame 列

我們還可以使用 NumPy 方法根據 Pandas 中的給定條件建立一個 DataFrame 列。為此，我們可以使用 np.where() 方法和 np.select() 方法。

`np.where()` 方法

np.where() 將條件作為輸入並返回滿足給定條件的元素的索引。當我們只有一個條件時，可以使用此方法根據 Pandas 中的給定條件建立 DataFrame 列。

import pandas as pd
import numpy as np

list_of_dates = [
    "2019-11-20",
    "2020-01-02",
    "2020-02-05",
    "2020-03-10",
    "2020-04-16",
    "2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
    {"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)

df["Status"] = np.where(df["Salary"] >= 400, "Senior", "Junior")
print(df)

輸出：

      Name Joined date  Salary  Status
0   Hisila  2019-11-20     200  Junior
1  Shristi  2020-01-02     400  Senior
2    Zeppy  2020-02-05     300  Junior
3    Alina  2020-03-10     500  Senior
4    Jerry  2020-04-16     600  Senior
5    Kevin  2020-05-01     300  Junior

如果滿足條件，則 np.where(condition, x, y) 返回 x，否則返回 y。

如果滿足給定條件，上面的程式碼將在 df 中建立一個新列 Status，其值為 Senior。否則，將該值設定為初級。

`np.select()` 方法

np.where()將條件列表和選擇列表作為輸入，並根據條件返回從選擇列表中的元素構建的陣列。當我們有兩個或多個條件時，可以使用此方法根據 Pandas 中的給定條件建立 DataFrame 列。

import pandas as pd
import numpy as np

list_of_dates = [
    "2019-11-20",
    "2020-01-02",
    "2020-02-05",
    "2020-03-10",
    "2020-04-16",
    "2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
    {"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)

conditionlist = [
    (df["Salary"] >= 500),
    (df["Salary"] >= 300) & (df["Salary"] < 300),
    (df["Salary"] <= 300),
]
choicelist = ["High", "Mid", "Low"]
df["Salary_Range"] = np.select(conditionlist, choicelist, default="Not Specified")

print(df)

輸出：

         Name Joined date  Salary Salary_Range
0   Hisila  2019-11-20     200          Low
1  Shristi  2020-01-02     400        black
2    Zeppy  2020-02-05     300          Low
3    Alina  2020-03-10     500         High
4    Jerry  2020-04-16     600         High
5    Kevin  2020-05-01     300          Low

這裡，如果滿足條件列表中的第一個條件的行，則該特定行的列 Salary_Range 的值將被設定為選擇列表中的第一個元素。條件列表中的其他條件類似。如果不滿足條件列表中的任何條件，則將該行的 Salary_Range 列的值設定為 np.where() 方法中的預設引數的值，例如，Not Specified。

`pandas.DataFrame.apply` 根據 Pandas 中的給定條件建立新的 DataFrame 列

pandas.DataFrame.apply 返回一個 DataFrame
沿 DataFrame 的給定軸應用給定函式的結果。

語法：

DataFrame.apply(self, func, axis=0, raw=False, result_type=None, args=(), **kwds)

func 代表要應用的函式。

axis 代表應用該函式的軸。我們可以使用 axis=1 或 axis = 'columns' 將函式應用於每一行。

我們可以使用此方法檢查條件併為新列的每一行設定值。

import pandas as pd
import numpy as np

list_of_dates = [
    "2019-11-20",
    "2020-01-02",
    "2020-02-05",
    "2020-03-10",
    "2020-04-16",
    "2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
    {"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)


def set_values(row, value):
    return value[row]


map_dictionary = {200: "Low", 300: "LOW", 400: "MID", 500: "HIGH", 600: "HIGH"}

df["Salary_Range"] = df["Salary"].apply(set_values, args=(map_dictionary,))

print(df)

輸出：

      Name Joined date  Salary Salary_Range
0   Hisila  2019-11-20     200          Low
1  Shristi  2020-01-02     400          MID
2    Zeppy  2020-02-05     300          LOW
3    Alina  2020-03-10     500         HIGH
4    Jerry  2020-04-16     600         HIGH
5    Kevin  2020-05-01     300          LOW

在這裡，我們定義了一個函式 set_values()，該函式使用 df.apply() 應用於每一行。該函式根據該行的 Salary 列的值來設定 Salary_Range 列的每一行的值。我們建立了一個 map_dictionary 來根據 Salary 列中的資料來決定 Salary_Range 列的數值。當新列有很多選項時，此方法為我們提供了更大的靈活性。

`pandas.Series.map()` 根據 Pandas 中的給定條件建立新的 DataFrame 列

我們也可以使用 pandas.Series.map() 建立新的 DataFrame 列基於 Pandas 中的給定條件。該方法適用於系列的元素方式，並根據可能是字典，函式或系列的輸入將值從一列對映到另一列。

import pandas as pd
import numpy as np

list_of_dates = [
    "2019-11-20",
    "2020-01-02",
    "2020-02-05",
    "2020-03-10",
    "2020-04-16",
    "2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
    {"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)

map_dictionary = {200: "Low", 300: "LOW", 400: "MID", 500: "HIGH", 600: "HIGH"}

df["Salary_Range"] = df["Salary"].map(map_dictionary)

print(df)

輸出：

      Name Joined date  Salary Salary_Range
0   Hisila  2019-11-20     200          Low
1  Shristi  2020-01-02     400          MID
2    Zeppy  2020-02-05     300          LOW
3    Alina  2020-03-10     500         HIGH
4    Jerry  2020-04-16     600         HIGH
5    Kevin  2020-05-01     300          LOW

它建立一個新列 Salary_Range，並根據 map_dictionary 中的鍵值對設定該列每一行的值。

Enjoying our tutorials? Subscribe to DelftStack on YouTube to support us in creating more high-quality video guides. Subscribe

作者： Suraj Joshi

Suraj Joshi is a backend software engineer at Matrice.ai.

列表推導以根據 Pandas 中的給定條件建立新的 DataFrame 列

NumPy 方法根據 Pandas 中的給定條件建立新的 DataFrame 列

np.where() 方法

np.select() 方法

pandas.DataFrame.apply 根據 Pandas 中的給定條件建立新的 DataFrame 列

pandas.Series.map() 根據 Pandas 中的給定條件建立新的 DataFrame 列

相關文章 - Pandas DataFrame Column

列表推導以根據 Pandas 中的給定條件建立新的 `DataFrame` 列

`np.where()` 方法

`np.select()` 方法

`pandas.DataFrame.apply` 根據 Pandas 中的給定條件建立新的 DataFrame 列

`pandas.Series.map()` 根據 Pandas 中的給定條件建立新的 DataFrame 列