Pandas DataFrame DataFrame.merge() Function
-
Syntax of
pandas.DataFrame.merge(): -
Example Codes:
DataFrame.merge()Function to Merge TwoDataFrames -
Example Codes: Set
howParameter in themergeMethod to Merge DataFrames Using Various Techniques -
Example Codes:Merge Only Specific Columns Using
DataFrame.merge()Function in Pandas -
Example Codes: Use Index as the Join Keys for Merging DataFrames
DataFrame.merge()Function
Python Pandas DataFrame.merge() function merges DataFrame or named Series objects.
Syntax of pandas.DataFrame.merge():
DataFrame.merge(
right,
how="inner",
on=None,
left_on=None,
right_on=None,
left_index=False,
right_index=False,
sort=False,
suffixes="_x",
"_y",
copy=True,
indicator=False,
validate=None,
)
Parameters
right |
DataFrame or named Series. Object to merge with |
how |
left, right, inner or outer. How to perform merge operation |
on |
label or list. Column or index names to merge |
left_on |
label or list. Column or index names to merge in the left DataFrame |
right_on |
label or list. Column or index names to merge in the right DataFrame |
left_index |
Boolean. Use the index from the left DataFrame as the join key(left_index=True) |
right_index |
Boolean. Use the index from the right DataFrame as the join key(right_index=True) |
sort |
Boolean. Sort the join keys alphabetically in the output(sort=True) |
suffixes |
Suffix to be applied to overlapping column names in the left and right side, respectively |
copy |
Boolean. Avoid copy for copy=False |
indicator |
add a column to output DataFrame called _merge with information on the source of each row(indicator=True) and a column named string will be added to output DataFrame(indicator=string) |
validate |
parameter to check if the merge is of a specified type. |
Return
It returns a DataFrame merging the given objects.
Example Codes: DataFrame.merge() Function to Merge Two DataFrames
import pandas as pd
df1 = pd.DataFrame(
{"Name": ["Suraj", "Zeppy", "Alish", "Sarah"], "Working Hours": [1, 2, 3, 5]}
)
df2 = pd.DataFrame({"Name": ["Suraj", "Zack", "Alish", "Raphel"], "Pay": [5, 6, 7, 8]})
print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)
merged_df = df1.merge(df2)
print("Merged DataFrame:")
print(merged_df)
Output:
1st DataFrame:
Name Working Hours
0 Suraj 1
1 Zeppy 2
2 Alish 3
3 Sarah 5
2nd DataFrame:
Name Pay
0 Suraj 5
1 Zack 6
2 Alish 7
3 Raphel 8
Merged DataFrame:
Name Working Hours Pay
0 Suraj 1 5
1 Alish 3 7
It merges df1 and df2 to a single DataFrame using the inner-join technique of SQL.
For the inner-join method, we must ensure there is at least one column common to both DataFrames.
Here, the merge() function will join the rows having the same values of the common column to both DataFrames.
Example Codes: Set how Parameter in the merge Method to Merge DataFrames Using Various Techniques
import pandas as pd
df1 = pd.DataFrame(
{"Name": ["Suraj", "Zeppy", "Alish", "Sarah"], "Working Hours": [1, 2, 3, 5]}
)
df2 = pd.DataFrame({"Name": ["Suraj", "Zack", "Alish", "Raphel"], "Pay": [5, 6, 7, 8]})
print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)
merged_df = df1.merge(df2, how="right")
print("Merged DataFrame:")
print(merged_df)
Output:
1st DataFrame:
Name Working Hours
0 Suraj 1
1 Zeppy 2
2 Alish 3
3 Sarah 5
2nd DataFrame:
Name Pay
0 Suraj 5
1 Zack 6
2 Alish 7
3 Raphel 8
Merged DataFrame:
Name Working Hours Pay
0 Suraj 1.0 5
1 Alish 3.0 7
2 Zack NaN 6
3 Raphel NaN 8
It merges df1 and df2 to a single DataFrame using the right-join technique of SQL.
Here, the merge() function returns all rows from the right DataFrame. However, the rows only present in the left DataFrame will get the NaN value.
Similarly, we can also use left and outer values of the how parameter.
Example Codes:Merge Only Specific Columns Using DataFrame.merge() Function in Pandas
import pandas as pd
df1 = pd.DataFrame(
{
"Name": ["Suraj", "Zeppy", "Alish", "Sarah"],
"Working Hours": [1, 2, 3, 5],
"Position": ["Salesman", "CEO", "Manager", "Sales Head"],
}
)
df2 = pd.DataFrame(
{
"Name": ["Suraj", "Zack", "Alish", "Raphel"],
"Pay": [5, 6, 7, 8],
"Position": ["Salesman", "CEO", "Manager", "Sales Head"],
}
)
print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)
merged_df = df1.merge(df2, on="Name")
print("Merged DataFrame:")
print(merged_df)
Output:
1st DataFrame:
Name Working Hours Position
0 Suraj 1 Salesman
1 Zeppy 2 CEO
2 Alish 3 Manager
3 Sarah 5 Sales Head
2nd DataFrame:
Name Pay Position
0 Suraj 5 Salesman
1 Zack 6 CEO
2 Alish 7 Manager
3 Raphel 8 Sales Head
Merged DataFrame:
Name Working Hours Position_x Pay Position_y
0 Suraj 1 Salesman 5 Salesman
1 Alish 3 Manager 7 Manager
It merges only the Name column of df1 and df2. Since the default join method is inner-join only the common rows to both DataFrame will be joined. The Position column is common to both DataFrames and hence two-position columns viz. Position_x and Position_y.
By default, _x and _y suffixes are appended to the name of the overlapping column. We can specify suffixes using the suffixes parameter.
df1 = pd.DataFrame(
{
"Name": ["Suraj", "Zeppy", "Alish", "Sarah"],
"Working Hours": [1, 2, 3, 5],
"Position": ["Salesman", "CEO", "Manager", "Sales Head"],
}
)
df2 = pd.DataFrame(
{
"Name": ["Suraj", "Zack", "Alish", "Raphel"],
"Pay": [5, 6, 7, 8],
"Position": ["Salesman", "CEO", "Manager", "Sales Head"],
}
)
print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)
merged_df = df1.merge(df2, on="Name", suffixes=("_left", "_right"))
print("Merged DataFrame:")
print(merged_df)
Output:
1st DataFrame:
Name Working Hours Position
0 Suraj 1 Salesman
1 Zeppy 2 CEO
2 Alish 3 Manager
3 Sarah 5 Sales Head
2nd DataFrame:
Name Pay Position
0 Suraj 5 Salesman
1 Zack 6 CEO
2 Alish 7 Manager
3 Raphel 8 Sales Head
Merged DataFrame:
Name Working Hours Position_left Pay Position_right
0 Suraj 1 Salesman 5 Salesman
1 Alish 3 Manager 7 Manager
Example Codes: Use Index as the Join Keys for Merging DataFrames DataFrame.merge() Function
import pandas as pd
df1 = pd.DataFrame(
{
"Name": ["Suraj", "Zeppy", "Alish", "Sarah"],
"Working Hours": [1, 2, 3, 5],
"Position": ["Salesman", "CEO", "Manager", "Sales Head"],
}
)
df2 = pd.DataFrame(
{
"Name": ["Suraj", "Zack", "Alish", "Raphel"],
"Pay": [5, 6, 7, 8],
"Position": ["Salesman", "CEO", "Manager", "Sales Head"],
}
)
print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)
merged_df = df1.merge(
df2, left_index=True, right_index=True, suffixes=("_left", "_right")
)
print("Merged DataFrame:")
print(merged_df)
Output:
1st DataFrame:
Name Working Hours Position
0 Suraj 1 Salesman
1 Zeppy 2 CEO
2 Alish 3 Manager
3 Sarah 5 Sales Head
2nd DataFrame:
Name Pay Position
0 Suraj 5 Salesman
1 Zack 6 CEO
2 Alish 7 Manager
3 Raphel 8 Sales Head
Merged DataFrame:
Name_left Working Hours Position_left Name_right Pay Position_right
0 Suraj 1 Salesman Suraj 5 Salesman
1 Zeppy 2 CEO Zack 6 CEO
2 Alish 3 Manager Alish 7 Manager
3 Sarah 5 Sales Head Raphel 8 Sales Head
It merges the corresponding rows of two DataFrames without considering column similarities. If the same column name appears on both DataFrame, the suffixes are attached to column names and made different columns after merging.
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedIn