# Combine Two Data Frames in R

Gustavo du Mortier Mar 29, 2022 Nov 27, 2020

When manipulating data with R code, we often face the need to combine two data frames into one. This tutorial will see a few methods to efficiently combine two data frames in R.

Suppose you have two data frames, `x` and `y`, with some matching columns. For example:

``````x <- data.frame(a=c(218, 415, 339), b=c(25, 19, 43), c=c(950, 872, 645))
y <- data.frame(a=c(309, 115), c=c(799, 814))
``````

And you need to combine them into one resulting data frame, called `z`, for example. Such data frames could be like these:

## Use `rbind` to Combine Two Data Frames in R

The `rbind` function combines data structures, such as data frames, vectors, or matrices, by rows. Its name stands for row-bind.

When using `rbind` to combine two data frames, both data frames need to have the same columns. Therefore, in the previous example, you need to add the `b` column to the data frame `y`. This can be done by executing this command:

``````y\$b <- NA
``````

Now the `y` data frame should look like this:

Now you can use `rbind` to combine the `x` and `y` data frames into the new `z` data frame by executing this command:

``````x <- data.frame(a=c(218, 415, 339), b=c(25, 19, 43), c=c(950, 872, 645))
y <- data.frame(a=c(309, 115), c=c(799, 814))

y\$b <- NA

z <- rbind(x, y)
``````

Output:

``````    a  b   c
1 218 25 950
2 415 19 872
3 339 43 645
4 309 NA 799
5 115 NA 814
``````

## Use the `dplyr` Package

If you don’t want to write an extra line of code or add fictitious columns to one of the data frames just to be able to use `rbind`, you can install the `dplyr` package and then simply use:

``````z <- bind_rows(x, y)
``````

It populates the `z` data frame with the combination of `x` and `y`.

## Combine Big Data Frames in R

The previous examples work fine with small data frames with a few rows and 2 or 3 columns. But when you need to merge big data sets with a lot of rows and an arbitrary number of columns, it could be better to write a function that does the job faster, like the following:

``````quickmerge <- function(df1, df2) {
df1.names <- names(df1)
df2.names <- names(df2)
df2.add <- setdiff(df1.names, df2.names)
df1.add <- setdiff(df2.names, df1.names)
if(length(df2.add) > 0) {
for(i in 1:length(df2.add)) {
df2[df2.add[i]] <- NA
}
}
if(length(df1.add) > 0) {
for(i in 1:length(df1.add)) {
df1[df1.add[i]] <- NA
}
}
return(rbind(df1, df2))
}
``````

This function begins by comparing the column names in the data frames and then adding the necessary columns to make them equal. Finally, it uses the `rbind` function to combine the rows and return the result. To call the function, you use:

``````z <- quickmerge(x, y)
``````

The complete example code is as below.

``````quickmerge <- function(df1, df2) {
df1.names <- names(df1)
df2.names <- names(df2)
df2.add <- setdiff(df1.names, df2.names)
df1.add <- setdiff(df2.names, df1.names)
if(length(df2.add) > 0) {
for(i in 1:length(df2.add)) {
df2[df2.add[i]] <- NA
}
}
if(length(df1.add) > 0) {
for(i in 1:length(df1.add)) {
df1[df1.add[i]] <- NA
}
}
return(rbind(df1, df2))
}

x <- data.frame(a=c(218, 415, 339), b=c(25, 19, 43), c=c(950, 872, 645))
y <- data.frame(a=c(309, 115), c=c(799, 814))

z <- quickmerge(x, y)

print(z)
``````

Output:

``````    a  b   c
1 218 25 950
2 415 19 872
3 339 43 645
4 309 NA 799
5 115 NA 814
``````