How to Create DataFrame Column Based on Given Condition in Pandas

Suraj Joshi Feb 02, 2024
  1. List Comprehension to Create New DataFrame Columns Based on a Given Condition in Pandas
  2. NumPy Methods to Create New DataFrame Columns Based on a Given Condition in Pandas
  3. pandas.DataFrame.apply to Create New DataFrame Columns Based on a Given Condition in Pandas
  4. pandas.Series.map() to Create New DataFrame Columns Based on a Given Condition in Pandas
How to Create DataFrame Column Based on Given Condition in Pandas

We can create the DataFrame columns based on a given condition in Pandas using list comprehension, NumPy methods, apply() method, and map() method of the DataFrame object.

List Comprehension to Create New DataFrame Columns Based on a Given Condition in Pandas

We can utilize various list Comprehension to create new DataFrame columns based on a given condition in Pandas. List comprehension is a method to create new lists from iterables. It is quite faster and simpler than other methods.

import pandas as pd
import numpy as np

list_of_dates = [
    "2019-11-20",
    "2020-01-02",
    "2020-02-05",
    "2020-03-10",
    "2020-04-16",
    "2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
    {"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)
df["Status"] = ["Senior" if s >= 400 else "Junior" for s in df["Salary"]]
print(df)

Output:

      Name Joined date  Salary  Status
0   Hisila  2019-11-20     200  Junior
1  Shristi  2020-01-02     400  Senior
2    Zeppy  2020-02-05     300  Junior
3    Alina  2020-03-10     500  Senior
4    Jerry  2020-04-16     600  Senior
5    Kevin  2020-05-01     300  Junior

It creates a new column Status in df whose value is Senior if the salary is greater than or equal to 400, or Junior otherwise.

NumPy Methods to Create New DataFrame Columns Based on a Given Condition in Pandas

We also can use NumPy methods to create a DataFrame column based on given conditions in Pandas. We can utilize np.where() method and np.select() method for this purpose.

np.where() Method

np.where() takes the condition as an input and returns the indices of elements that satisfy the given condition. We can use this method to create a DataFrame column based on given conditions in Pandas when we have only one condition.

import pandas as pd
import numpy as np

list_of_dates = [
    "2019-11-20",
    "2020-01-02",
    "2020-02-05",
    "2020-03-10",
    "2020-04-16",
    "2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
    {"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)

df["Status"] = np.where(df["Salary"] >= 400, "Senior", "Junior")
print(df)

Output:

      Name Joined date  Salary  Status
0   Hisila  2019-11-20     200  Junior
1  Shristi  2020-01-02     400  Senior
2    Zeppy  2020-02-05     300  Junior
3    Alina  2020-03-10     500  Senior
4    Jerry  2020-04-16     600  Senior
5    Kevin  2020-05-01     300  Junior

np.where(condition, x, y) returns x if the condition is met, otherwise y.

The above code creates a new column Status in df whose value is Senior if the given condition is satisfied; otherwise, the value is set to Junior.

np.select() Method

np.where() takes condition-list and choice-list as an input and returns an array built from elements in choice-list, depending on conditions. We can use this method to create a DataFrame column based on given conditions in Pandas when we have two or more conditions.

import pandas as pd
import numpy as np

list_of_dates = [
    "2019-11-20",
    "2020-01-02",
    "2020-02-05",
    "2020-03-10",
    "2020-04-16",
    "2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
    {"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)

conditionlist = [
    (df["Salary"] >= 500),
    (df["Salary"] >= 300) & (df["Salary"] < 300),
    (df["Salary"] <= 300),
]
choicelist = ["High", "Mid", "Low"]
df["Salary_Range"] = np.select(conditionlist, choicelist, default="Not Specified")

print(df)

Output:

         Name Joined date  Salary Salary_Range
0   Hisila  2019-11-20     200          Low
1  Shristi  2020-01-02     400        black
2    Zeppy  2020-02-05     300          Low
3    Alina  2020-03-10     500         High
4    Jerry  2020-04-16     600         High
5    Kevin  2020-05-01     300          Low

Here, if the 1st condition in the conditionlist is satisfied for a row, the value of column Salary_Range for that specific row is set to the 1st element in the choicelist. Other conditions in the conditionlist are similar. If none of the conditions in the conditionlist is satisfied, the value of column Salary_Range for that row is set to the value of the default parameter in the np.where() method, for example, Not Specified.

pandas.DataFrame.apply to Create New DataFrame Columns Based on a Given Condition in Pandas

pandas.DataFrame.apply returns a DataFrame
as a result of applying the given function along the given axis of the DataFrame.

Syntax:

DataFrame.apply(self, func, axis=0, raw=False, result_type=None, args=(), **kwds)

func represents the function to be applied.

axis represents the axis along which the function is applied. We can use axis=1 or axis = 'columns' to apply function to each row.

We can use this method to check the condition and set values for every row of a new column.

import pandas as pd
import numpy as np

list_of_dates = [
    "2019-11-20",
    "2020-01-02",
    "2020-02-05",
    "2020-03-10",
    "2020-04-16",
    "2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
    {"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)


def set_values(row, value):
    return value[row]


map_dictionary = {200: "Low", 300: "LOW", 400: "MID", 500: "HIGH", 600: "HIGH"}

df["Salary_Range"] = df["Salary"].apply(set_values, args=(map_dictionary,))

print(df)

Output:

      Name Joined date  Salary Salary_Range
0   Hisila  2019-11-20     200          Low
1  Shristi  2020-01-02     400          MID
2    Zeppy  2020-02-05     300          LOW
3    Alina  2020-03-10     500         HIGH
4    Jerry  2020-04-16     600         HIGH
5    Kevin  2020-05-01     300          LOW

Here, we define a function set_values() which is applied to every row using df.apply(). The function sets the value of each row of the Salary_Range column depending upon the value of the Salary column of that row. We make map_dictionary to assign what will be the value of the Salary_Range column for a row given its value in the Salary column. This method provides us much more flexibility when we have a large number of options for the new column.

pandas.Series.map() to Create New DataFrame Columns Based on a Given Condition in Pandas

We could also use pandas.Series.map() to create new DataFrame columns based on a given condition in Pandas. This method is applied elementwise for Series and maps values from one column to the other based on the input that could be a dictionary, function, or Series.

import pandas as pd
import numpy as np

list_of_dates = [
    "2019-11-20",
    "2020-01-02",
    "2020-02-05",
    "2020-03-10",
    "2020-04-16",
    "2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
    {"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)

map_dictionary = {200: "Low", 300: "LOW", 400: "MID", 500: "HIGH", 600: "HIGH"}

df["Salary_Range"] = df["Salary"].map(map_dictionary)

print(df)

Output:

      Name Joined date  Salary Salary_Range
0   Hisila  2019-11-20     200          Low
1  Shristi  2020-01-02     400          MID
2    Zeppy  2020-02-05     300          LOW
3    Alina  2020-03-10     500         HIGH
4    Jerry  2020-04-16     600         HIGH
5    Kevin  2020-05-01     300          LOW

It creates a new column Salary_Range and sets the values of each row of the column depending on the key-value pairs in map_dictionary.

Author: Suraj Joshi
Suraj Joshi avatar Suraj Joshi avatar

Suraj Joshi is a backend software engineer at Matrice.ai.

LinkedIn

Related Article - Pandas DataFrame Column

Related Article - Pandas Condition