How to Create DataFrame Column Based on Given Condition in Pandas
-
List Comprehension to Create New
DataFrameColumns Based on a Given Condition in Pandas -
NumPy Methods to Create New
DataFrameColumns Based on a Given Condition in Pandas -
pandas.DataFrame.applyto Create NewDataFrameColumns Based on a Given Condition in Pandas -
pandas.Series.map()to Create NewDataFrameColumns Based on a Given Condition in Pandas
We can create the DataFrame columns based on a given condition in Pandas using list comprehension, NumPy methods, apply() method, and map() method of the DataFrame object.
List Comprehension to Create New DataFrame Columns Based on a Given Condition in Pandas
We can utilize various list Comprehension to create new DataFrame columns based on a given condition in Pandas. List comprehension is a method to create new lists from iterables. It is quite faster and simpler than other methods.
import pandas as pd
import numpy as np
list_of_dates = [
"2019-11-20",
"2020-01-02",
"2020-02-05",
"2020-03-10",
"2020-04-16",
"2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
{"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)
df["Status"] = ["Senior" if s >= 400 else "Junior" for s in df["Salary"]]
print(df)
Output:
Name Joined date Salary Status
0 Hisila 2019-11-20 200 Junior
1 Shristi 2020-01-02 400 Senior
2 Zeppy 2020-02-05 300 Junior
3 Alina 2020-03-10 500 Senior
4 Jerry 2020-04-16 600 Senior
5 Kevin 2020-05-01 300 Junior
It creates a new column Status in df whose value is Senior if the salary is greater than or equal to 400, or Junior otherwise.
NumPy Methods to Create New DataFrame Columns Based on a Given Condition in Pandas
We also can use NumPy methods to create a DataFrame column based on given conditions in Pandas. We can utilize np.where() method and np.select() method for this purpose.
np.where() Method
np.where() takes the condition as an input and returns the indices of elements that satisfy the given condition. We can use this method to create a DataFrame column based on given conditions in Pandas when we have only one condition.
import pandas as pd
import numpy as np
list_of_dates = [
"2019-11-20",
"2020-01-02",
"2020-02-05",
"2020-03-10",
"2020-04-16",
"2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
{"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)
df["Status"] = np.where(df["Salary"] >= 400, "Senior", "Junior")
print(df)
Output:
Name Joined date Salary Status
0 Hisila 2019-11-20 200 Junior
1 Shristi 2020-01-02 400 Senior
2 Zeppy 2020-02-05 300 Junior
3 Alina 2020-03-10 500 Senior
4 Jerry 2020-04-16 600 Senior
5 Kevin 2020-05-01 300 Junior
np.where(condition, x, y) returns x if the condition is met, otherwise y.
The above code creates a new column Status in df whose value is Senior if the given condition is satisfied; otherwise, the value is set to Junior.
np.select() Method
np.where() takes condition-list and choice-list as an input and returns an array built from elements in choice-list, depending on conditions. We can use this method to create a DataFrame column based on given conditions in Pandas when we have two or more conditions.
import pandas as pd
import numpy as np
list_of_dates = [
"2019-11-20",
"2020-01-02",
"2020-02-05",
"2020-03-10",
"2020-04-16",
"2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
{"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)
conditionlist = [
(df["Salary"] >= 500),
(df["Salary"] >= 300) & (df["Salary"] < 300),
(df["Salary"] <= 300),
]
choicelist = ["High", "Mid", "Low"]
df["Salary_Range"] = np.select(conditionlist, choicelist, default="Not Specified")
print(df)
Output:
Name Joined date Salary Salary_Range
0 Hisila 2019-11-20 200 Low
1 Shristi 2020-01-02 400 black
2 Zeppy 2020-02-05 300 Low
3 Alina 2020-03-10 500 High
4 Jerry 2020-04-16 600 High
5 Kevin 2020-05-01 300 Low
Here, if the 1st condition in the conditionlist is satisfied for a row, the value of column Salary_Range for that specific row is set to the 1st element in the choicelist. Other conditions in the conditionlist are similar. If none of the conditions in the conditionlist is satisfied, the value of column Salary_Range for that row is set to the value of the default parameter in the np.where() method, for example, Not Specified.
pandas.DataFrame.apply to Create New DataFrame Columns Based on a Given Condition in Pandas
pandas.DataFrame.apply returns a DataFrame
as a result of applying the given function along the given axis of the DataFrame.
Syntax:
DataFrame.apply(self, func, axis=0, raw=False, result_type=None, args=(), **kwds)
func represents the function to be applied.
axis represents the axis along which the function is applied. We can use axis=1 or axis = 'columns' to apply function to each row.
We can use this method to check the condition and set values for every row of a new column.
import pandas as pd
import numpy as np
list_of_dates = [
"2019-11-20",
"2020-01-02",
"2020-02-05",
"2020-03-10",
"2020-04-16",
"2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
{"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)
def set_values(row, value):
return value[row]
map_dictionary = {200: "Low", 300: "LOW", 400: "MID", 500: "HIGH", 600: "HIGH"}
df["Salary_Range"] = df["Salary"].apply(set_values, args=(map_dictionary,))
print(df)
Output:
Name Joined date Salary Salary_Range
0 Hisila 2019-11-20 200 Low
1 Shristi 2020-01-02 400 MID
2 Zeppy 2020-02-05 300 LOW
3 Alina 2020-03-10 500 HIGH
4 Jerry 2020-04-16 600 HIGH
5 Kevin 2020-05-01 300 LOW
Here, we define a function set_values() which is applied to every row using df.apply(). The function sets the value of each row of the Salary_Range column depending upon the value of the Salary column of that row. We make map_dictionary to assign what will be the value of the Salary_Range column for a row given its value in the Salary column. This method provides us much more flexibility when we have a large number of options for the new column.
pandas.Series.map() to Create New DataFrame Columns Based on a Given Condition in Pandas
We could also use pandas.Series.map() to create new DataFrame columns based on a given condition in Pandas. This method is applied elementwise for Series and maps values from one column to the other based on the input that could be a dictionary, function, or Series.
import pandas as pd
import numpy as np
list_of_dates = [
"2019-11-20",
"2020-01-02",
"2020-02-05",
"2020-03-10",
"2020-04-16",
"2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
{"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)
map_dictionary = {200: "Low", 300: "LOW", 400: "MID", 500: "HIGH", 600: "HIGH"}
df["Salary_Range"] = df["Salary"].map(map_dictionary)
print(df)
Output:
Name Joined date Salary Salary_Range
0 Hisila 2019-11-20 200 Low
1 Shristi 2020-01-02 400 MID
2 Zeppy 2020-02-05 300 LOW
3 Alina 2020-03-10 500 HIGH
4 Jerry 2020-04-16 600 HIGH
5 Kevin 2020-05-01 300 LOW
It creates a new column Salary_Range and sets the values of each row of the column depending on the key-value pairs in map_dictionary.
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedInRelated Article - Pandas DataFrame Column
- How to Get Pandas DataFrame Column Headers as a List
- How to Delete Pandas DataFrame Column
- How to Convert Pandas Column to Datetime
- How to Get the Sum of Pandas Column
- How to Change the Order of Pandas DataFrame Columns
- How to Convert DataFrame Column to String in Pandas
