Combine Two Data Frames in R

Gustavo du Mortier Mar 29, 2022 Nov 27, 2020
  1. Use rbind to Combine Two Data Frames in R
  2. Use the dplyr Package
  3. Combine Big Data Frames in R
Combine Two Data Frames in R

When manipulating data with R code, we often face the need to combine two data frames into one. This tutorial will see a few methods to efficiently combine two data frames in R.

Suppose you have two data frames, x and y, with some matching columns. For example:

x <- data.frame(a=c(218, 415, 339), b=c(25, 19, 43), c=c(950, 872, 645))
y <- data.frame(a=c(309, 115), c=c(799, 814))

And you need to combine them into one resulting data frame, called z, for example. Such data frames could be like these:

R data frame example

Use rbind to Combine Two Data Frames in R

The rbind function combines data structures, such as data frames, vectors, or matrices, by rows. Its name stands for row-bind.

When using rbind to combine two data frames, both data frames need to have the same columns. Therefore, in the previous example, you need to add the b column to the data frame y. This can be done by executing this command:

y$b <- NA

Now the y data frame should look like this:

R data frame example

Now you can use rbind to combine the x and y data frames into the new z data frame by executing this command:

x <- data.frame(a=c(218, 415, 339), b=c(25, 19, 43), c=c(950, 872, 645))
y <- data.frame(a=c(309, 115), c=c(799, 814))

y$b <- NA

z <- rbind(x, y)

Output:

    a  b   c
1 218 25 950
2 415 19 872
3 339 43 645
4 309 NA 799
5 115 NA 814

Use the dplyr Package

If you don’t want to write an extra line of code or add fictitious columns to one of the data frames just to be able to use rbind, you can install the dplyr package and then simply use:

z <- bind_rows(x, y)

It populates the z data frame with the combination of x and y.

Combine Big Data Frames in R

The previous examples work fine with small data frames with a few rows and 2 or 3 columns. But when you need to merge big data sets with a lot of rows and an arbitrary number of columns, it could be better to write a function that does the job faster, like the following:

quickmerge <- function(df1, df2) {
  df1.names <- names(df1)
  df2.names <- names(df2)
  df2.add <- setdiff(df1.names, df2.names)
  df1.add <- setdiff(df2.names, df1.names)
  if(length(df2.add) > 0) {
    for(i in 1:length(df2.add)) {
      df2[df2.add[i]] <- NA
    }
  }
  if(length(df1.add) > 0) {
    for(i in 1:length(df1.add)) {
      df1[df1.add[i]] <- NA
    }
  }
  return(rbind(df1, df2))
}

This function begins by comparing the column names in the data frames and then adding the necessary columns to make them equal. Finally, it uses the rbind function to combine the rows and return the result. To call the function, you use:

z <- quickmerge(x, y)

The complete example code is as below.

quickmerge <- function(df1, df2) {
  df1.names <- names(df1)
  df2.names <- names(df2)
  df2.add <- setdiff(df1.names, df2.names)
  df1.add <- setdiff(df2.names, df1.names)
  if(length(df2.add) > 0) {
    for(i in 1:length(df2.add)) {
      df2[df2.add[i]] <- NA
    }
  }
  if(length(df1.add) > 0) {
    for(i in 1:length(df1.add)) {
      df1[df1.add[i]] <- NA
    }
  }
  return(rbind(df1, df2))
}

x <- data.frame(a=c(218, 415, 339), b=c(25, 19, 43), c=c(950, 872, 645))
y <- data.frame(a=c(309, 115), c=c(799, 814))

z <- quickmerge(x, y)

print(z)

Output:

    a  b   c
1 218 25 950
2 415 19 872
3 339 43 645
4 309 NA 799
5 115 NA 814

Related Article - R Data Frame