Vectorized if Function With Multiple Conditions in R

Vectorized if Function With Multiple Conditions in R

Jesse John Feb-11, 2022 R R Vectorized
  1. Limitation of the if Statement in R
  2. the Vectorized ifelse() Function in R
  3. the if_else() Function of the dplyr Package in R
  4. Use Multiple Conditions in the if_else() Function in R
  5. Conclusion

A common data analysis task is to create or update a data frame column using one or multiple conditions based on the other columns of the same row.

If we try to do this using an if statement, only the first row is used to test the condition, and the entire column is updated based on that row.

When working with a data frame, we need tools and techniques that work on multiple rows. This article will learn vectorized if functions and vectorized AND and OR operators to combine multiple conditions.

We will first create a small data frame for illustration.

# Create two vectors.
Col1 = rep(c("A", "B"), times = 2, each = 2)
Col2 = rep(c("x", "y"), times = 1, each = 4)

# Create a data frame.
cond_df = data.frame(Col1, Col2)

# View the data frame.
cond_df

Limitation of the if Statement in R

According to the documentation, the if statement takes a length-one logical vector that is not NA….only the first element is used.

In the following example, we will create a column based on a condition that uses another column.

# Try to use the if statement.
cond_df$NewCol = if(Col1 == "B"){cond_df$NewCol = "Col1 was B"} else{cond_df$NewCol = "Col1 was not B"}

# View the result.
cond_df

Output:

  Col1 Col2         NewCol
1    A    x Col1 was not B
2    A    x Col1 was not B
3    B    x Col1 was not B
4    B    x Col1 was not B
5    A    y Col1 was not B
6    A    y Col1 was not B
7    B    y Col1 was not B
8    B    y Col1 was not B

R warned when we executed this if statement but created the column. The result is not what we wanted.

Only the first row was evaluated, and the result was applied to all the data frame rows.

the Vectorized ifelse() Function in R

Base R includes a vectorized ifelse() function, which we can use to conditionally update a data frame column.

According to the documentation, this function “…returns a value with the same shape as test….”, and this makes it suitable for use on a data frame.

The syntax of the function is: ifelse(test, value_if_true, value_if_false). The following code illustrates the use of this function.

# Create a new data frame using the same vectors.
vect_df = data.frame(Col1, Col2)

# Use the vectorized ifelse() function.
vect_df$NewCol = ifelse(Col1 == "B", "Col1 was B", "F")

# view the result.
vect_df

Output:

> vect_df
  Col1 Col2     NewCol
1    A    x          F
2    A    x          F
3    B    x Col1 was B
4    B    x Col1 was B
5    A    y          F
6    A    y          F
7    B    y Col1 was B
8    B    y Col1 was B

This function worked as expected. We can use it to create or update a data frame column using conditions based on values from other columns.

But this function has a limitation. The documentation states ifelse() strips attributes. This is important when working with Dates and factors.

Let us see an example of the problem, which are:

  • Create a vector of dates.
  • Create a new vector using the ifelse() function on the first vector. The change caused by the ifelse() function is unexpected.
# Create and view a vector of dates.
datevec = seq(from = as.Date("2022-01-01"), to = as.Date("2022-01-05"), by = "day")
datevec
class(datevec)

# Create a new vector of dates using the ifelse() function on the previous vector. View it.
mod_datevec = ifelse(datevec < as.Date("2022-01-03"), datevec, as.Date("2022-02-01"))
mod_datevec # Not expected result.
class(mod_datevec) # Not date.

Output:

> datevec = seq(from = as.Date("2022-01-01"), to = as.Date("2022-01-05"), by = "day")
> datevec
[1] "2022-01-01" "2022-01-02" "2022-01-03" "2022-01-04" "2022-01-05"
> class(datevec)
[1] "Date"
>
> mod_datevec = ifelse(datevec < as.Date("2022-01-03"), datevec, as.Date("2022-02-01"))
> mod_datevec
[1] 18993 18994 19024 19024 19024
> class(mod_datevec)
[1] "numeric"

We find that dates have changed to numbers. The ifelse() function does not work as expected on dates and factor variables.

Let us now look at a solution offered by the dplyr package.

the if_else() Function of the dplyr Package in R

The if_else() function from the dplyr package addresses some of the issues associated with base R’s ifelse() function.

  • It ensures value_if_true and value_if_false are of the same type.
  • It takes all other attributes from value_if_true.

Let us use this function as an example.

# First load the dplyr package.
library(dplyr)

# Create another data frame from the two vectors.
dplyr_df = data.frame(Col1, Col2)

# Use the vectorized if_else() function.
dplyr_df$NewCol = if_else(Col1 == "B", "Col1 was B", "F")

# view the result.
dplyr_df

We can inspect the output and see that the function worked as expected, like base R’s ifelse().

How does it work on dates? Let us check.

# Create a new vector using if_else() based on the vector created earlier. View it.
dplyr_datevec = if_else(datevec < as.Date("2022-01-03"), datevec, as.Date(NA))
dplyr_datevec

Output:

> dplyr_datevec = if_else(datevec < as.Date("2022-01-03"), datevec, as.Date(NA))
> dplyr_datevec
[1] "2022-01-01" "2022-01-02" NA           NA           NA

We find that dplyr and if_else() function works correctly on dates.

Use Multiple Conditions in the if_else() Function in R

We can combine multiple conditions using the vectorized & and | operators, representing AND and OR.

These can be used in both ifelse() and if_else(). In our example, we will use if_else() because it is the better one.

# Create a data frame from the same two vectors.
mult_df = data.frame(Col1, Col2)

# Create a new column based on multiple conditions combined with AND, using &.
mult_df$AND_Col = if_else((Col1 == "A" & Col2 == "y"), "AND", "F")

# View the data frame with the added column.
mult_df

# Create another column based on multiple conditions combined with Or, using |.
mult_df$OR_Col = if_else((Col1 == "A" | Col2 == "y"), "OR", "F")

# View the data frame with the added column.
mult_df

The output of the last command:

> mult_df
  Col1 Col2 AND_Col OR_Col
1    A    x       F     OR
2    A    x       F     OR
3    B    x       F      F
4    B    x       F      F
5    A    y     AND     OR
6    A    y     AND     OR
7    B    y       F     OR
8    B    y       F     OR

Remember that R has vectorized and non-vectorized versions of the AND and OR operators. We used the vectorized & and | operators to combine two conditions because we wanted to test the conditions for each row.

The & and | are vectorized; && and || are non-vectorized.

References and Help:

In R Studio, for more information about the if statement, ifelse() function or if_else() function, click Help > Search R Help and type the statement/function name in the search box without parentheses.

Alternately, type a question mark followed by the statement/function name at the command prompt in the R Console.

Conclusion

Statements, functions and operators that work with a single variable may not work with data frames. We need to use the appropriate tools for the task.

To create/update a column of a data frame conditionally, we used the vectorized ifelse() function and its better dplyr version, if_else().

We used the vectorized AND and OR operators to combine multiple conditions, & and |.

Author: Jesse John
Jesse John avatar Jesse John avatar

Jesse is passionate about data analysis and visualization. He uses the R statistical programming language for all aspects of his work.