Replace NA With Zero in R
There is a simple way to replace
NA with zeroes in a data frame in R. Suppose you have a data frame called
my_data. To replace all
NA values with zeroes in that data frame, you can execute this statement.
my_data[is.na(my_data)] <- 0
For example, if
my_data has the below content.
C1 C2 C3 C4 C5 1 4 3 <NA> 3 7 2 9 8 ABC 5 10 3 1 1 XYZ 3 6 4 NA 4 <NA> 7 10 5 1 2 ZC1 NA 2
When you execute
my_data[is.na(my_data)] <- 0 the data frame’s content change to this.
C1 C2 C3 C4 C5 1 4 3 0 3 7 2 9 8 ABC 5 10 3 1 1 XYZ 3 6 4 0 4 0 7 10 5 1 2 ZC1 0 2
Replace NA With Zero in Bigger R Data Frames
The previous solution uses the Base R subset reassigns, which work fine when you have relatively small data frames. But for bigger data sets, you might need a faster alternative, like the new hybrid evaluation approach implemented in recent versions of the
The new approach employed by the
dplyr package recognizes entire expressions and uses C++ code to evaluate them. In this way, you can achieve up to 30% faster transforms when processing big data frames.
NA values with zeroes using the
dplyr package, you can use the
mutate function with the
_all scoped verb and the
replace function in the
purrr format, as in the below example.
my_data <- mutate_all(my_data, ~replace(., is.na(.), 0))
The use of the
purrr notation allows us to apply the
replace function to each data frame element.
Replace NA With Zero in a Subset of R Data Frame
Instead of the
_all scoped verb in the
mutate function, you can use the
_at scoped verb to restrict the replacement action to specific columns. To do that, you can include a vector with the columns’ names where you want the replacement to be applied. Using the previous data frame, if you need to replace
NA values only in columns
C4, you can use the following command:
my_data <- mutate_at(my_data, c("C1", "C4"), ~replace(., is.na(.), 0))
In this way, only the NAs in columns
C4 get replaced by 0, resulting in a data frame like below.
C1 C2 C3 C4 C5 1 4 3 <NA> 3 7 2 9 8 ABC 5 10 3 1 1 XYZ 3 6 4 0 4 <NA> 7 10 5 1 2 ZC1 0 2
In the previous example, you might have wanted to replace
NA with zeroes only in numeric columns to avoid including zero values in alphanumeric columns such as
C3. If that is the case, instead of specifying the columns where you want to apply the replacement, you can use the
mutate_if function with the
is.numeric condition to tell R to replace
NA with zeroes only in numeric columns. In the following example, you can find the complete code to try this out, from installing the
dplyr package and populating the data frame to performing the replacements and displaying the results.
install.packages("dplyr") library(dplyr) C1 <- c(4, 9, 1, NA, 1) C2 <- c(3, 8, 1, 4, 2) C3 <- c(NA, 'ABC', 'XYZ', NA, 'ZC1') C4 <- c(3, 5, 3, 7, NA) C5 <- c(7, 10, NA, 10, 2) my_data <- data.frame(C1, C2, C3, C4, C5) my_data <- mutate_if(my_data, is.numeric, ~replace(., is.na(.), 0)) my_data
C1 C2 C3 C4 C5 1 4 3 <NA> 3 7 2 9 8 ABC 5 10 3 1 1 XYZ 3 0 4 0 4 <NA> 7 10 5 1 2 ZC1 0 2
You can find more info on the
mutate() function and its variants in the R Documentation.
Related Article - R Data Frame
- Delete Multiple Columns in R
- Get the Number of Columns in R
- Create Empty Data Frame in R
- Remove Rows With NA in One Column in R
- Remove Duplicate Rows by Column in R
- Create a Large Data Frame in R