Combine Two Data Frames in R

When manipulating data with R code, we often face the need to combine two data frames into one. This tutorial will see a few methods to efficiently combine two data frames in R.
Suppose you have two data frames, x
and y
, with some matching columns. For example:
x <- data.frame(a=c(218, 415, 339), b=c(25, 19, 43), c=c(950, 872, 645))
y <- data.frame(a=c(309, 115), c=c(799, 814))
And you need to combine them into one resulting data frame, called z
, for example. Such data frames could be like these:
Use rbind
to Combine Two Data Frames in R
The rbind
function combines data structures, such as data frames, vectors, or matrices, by rows. Its name stands for row-bind.
When using rbind
to combine two data frames, both data frames need to have the same columns. Therefore, in the previous example, you need to add the b
column to the data frame y
. This can be done by executing this command:
y$b <- NA
Now the y
data frame should look like this:
Now you can use rbind
to combine the x
and y
data frames into the new z
data frame by executing this command:
x <- data.frame(a=c(218, 415, 339), b=c(25, 19, 43), c=c(950, 872, 645))
y <- data.frame(a=c(309, 115), c=c(799, 814))
y$b <- NA
z <- rbind(x, y)
Output:
a b c
1 218 25 950
2 415 19 872
3 339 43 645
4 309 NA 799
5 115 NA 814
Use the dplyr
Package
If you don’t want to write an extra line of code or add fictitious columns to one of the data frames just to be able to use rbind
, you can install the dplyr
package and then simply use:
z <- bind_rows(x, y)
It populates the z
data frame with the combination of x
and y
.
Combine Big Data Frames in R
The previous examples work fine with small data frames with a few rows and 2 or 3 columns. But when you need to merge big data sets with a lot of rows and an arbitrary number of columns, it could be better to write a function that does the job faster, like the following:
quickmerge <- function(df1, df2) {
df1.names <- names(df1)
df2.names <- names(df2)
df2.add <- setdiff(df1.names, df2.names)
df1.add <- setdiff(df2.names, df1.names)
if(length(df2.add) > 0) {
for(i in 1:length(df2.add)) {
df2[df2.add[i]] <- NA
}
}
if(length(df1.add) > 0) {
for(i in 1:length(df1.add)) {
df1[df1.add[i]] <- NA
}
}
return(rbind(df1, df2))
}
This function begins by comparing the column names in the data frames and then adding the necessary columns to make them equal. Finally, it uses the rbind
function to combine the rows and return the result. To call the function, you use:
z <- quickmerge(x, y)
The complete example code is as below.
quickmerge <- function(df1, df2) {
df1.names <- names(df1)
df2.names <- names(df2)
df2.add <- setdiff(df1.names, df2.names)
df1.add <- setdiff(df2.names, df1.names)
if(length(df2.add) > 0) {
for(i in 1:length(df2.add)) {
df2[df2.add[i]] <- NA
}
}
if(length(df1.add) > 0) {
for(i in 1:length(df1.add)) {
df1[df1.add[i]] <- NA
}
}
return(rbind(df1, df2))
}
x <- data.frame(a=c(218, 415, 339), b=c(25, 19, 43), c=c(950, 872, 645))
y <- data.frame(a=c(309, 115), c=c(799, 814))
z <- quickmerge(x, y)
print(z)
Output:
a b c
1 218 25 950
2 415 19 872
3 339 43 645
4 309 NA 799
5 115 NA 814
Related Article - R Data Frame
- Delete Multiple Columns in R
- Get the Number of Columns in R
- Create Empty Data Frame in R
- Remove Rows With NA in One Column in R
- Remove Duplicate Rows by Column in R
- Create a Large Data Frame in R