# Vectorized if Function With Multiple Conditions in R

Jesse John Feb-11, 2022 R R Vectorized

A common data analysis task is to create or update a data frame column using one or multiple conditions based on the other columns of the same row.

If we try to do this using an `if` statement, only the first row is used to test the condition, and the entire column is updated based on that row.

When working with a data frame, we need tools and techniques that work on multiple rows. This article will learn vectorized `if` functions and vectorized `AND` and `OR` operators to combine multiple conditions.

We will first create a small data frame for illustration.

``````# Create two vectors.
Col1 = rep(c("A", "B"), times = 2, each = 2)
Col2 = rep(c("x", "y"), times = 1, each = 4)

# Create a data frame.
cond_df = data.frame(Col1, Col2)

# View the data frame.
cond_df
``````

## Limitation of the `if` Statement in R

According to the documentation, the `if` statement takes a length-one logical vector that is not NA….only the first element is used.

In the following example, we will create a column based on a condition that uses another column.

``````# Try to use the if statement.
cond_df\$NewCol = if(Col1 == "B"){cond_df\$NewCol = "Col1 was B"} else{cond_df\$NewCol = "Col1 was not B"}

# View the result.
cond_df
``````

Output:

``````  Col1 Col2         NewCol
1    A    x Col1 was not B
2    A    x Col1 was not B
3    B    x Col1 was not B
4    B    x Col1 was not B
5    A    y Col1 was not B
6    A    y Col1 was not B
7    B    y Col1 was not B
8    B    y Col1 was not B
``````

R warned when we executed this `if` statement but created the column. The result is not what we wanted.

Only the first row was evaluated, and the result was applied to all the data frame rows.

## the Vectorized `ifelse()` Function in R

Base R includes a vectorized `ifelse()` function, which we can use to conditionally update a data frame column.

According to the documentation, this function “…returns a value with the same shape as test….”, and this makes it suitable for use on a data frame.

The syntax of the function is: `ifelse(test, value_if_true, value_if_false)`. The following code illustrates the use of this function.

``````# Create a new data frame using the same vectors.
vect_df = data.frame(Col1, Col2)

# Use the vectorized ifelse() function.
vect_df\$NewCol = ifelse(Col1 == "B", "Col1 was B", "F")

# view the result.
vect_df
``````

Output:

``````> vect_df
Col1 Col2     NewCol
1    A    x          F
2    A    x          F
3    B    x Col1 was B
4    B    x Col1 was B
5    A    y          F
6    A    y          F
7    B    y Col1 was B
8    B    y Col1 was B
``````

This function worked as expected. We can use it to create or update a data frame column using conditions based on values from other columns.

But this function has a limitation. The documentation states `ifelse()` strips attributes. This is important when working with Dates and factors.

Let us see an example of the problem, which are:

• Create a vector of dates.
• Create a new vector using the `ifelse()` function on the first vector. The change caused by the `ifelse()` function is unexpected.
``````# Create and view a vector of dates.
datevec = seq(from = as.Date("2022-01-01"), to = as.Date("2022-01-05"), by = "day")
datevec
class(datevec)

# Create a new vector of dates using the ifelse() function on the previous vector. View it.
mod_datevec = ifelse(datevec < as.Date("2022-01-03"), datevec, as.Date("2022-02-01"))
mod_datevec # Not expected result.
class(mod_datevec) # Not date.
``````

Output:

``````> datevec = seq(from = as.Date("2022-01-01"), to = as.Date("2022-01-05"), by = "day")
> datevec
[1] "2022-01-01" "2022-01-02" "2022-01-03" "2022-01-04" "2022-01-05"
> class(datevec)
[1] "Date"
>
> mod_datevec = ifelse(datevec < as.Date("2022-01-03"), datevec, as.Date("2022-02-01"))
> mod_datevec
[1] 18993 18994 19024 19024 19024
> class(mod_datevec)
[1] "numeric"
``````

We find that dates have changed to numbers. The `ifelse()` function does not work as expected on dates and factor variables.

Let us now look at a solution offered by the `dplyr` package.

## the `if_else()` Function of the `dplyr` Package in R

The `if_else()` function from the `dplyr` package addresses some of the issues associated with base R’s `ifelse()` function.

• It ensures `value_if_true` and `value_if_false` are of the same type.
• It takes all other attributes from `value_if_true`.

Let us use this function as an example.

``````# First load the dplyr package.
library(dplyr)

# Create another data frame from the two vectors.
dplyr_df = data.frame(Col1, Col2)

# Use the vectorized if_else() function.
dplyr_df\$NewCol = if_else(Col1 == "B", "Col1 was B", "F")

# view the result.
dplyr_df
``````

We can inspect the output and see that the function worked as expected, like base R’s `ifelse()`.

How does it work on dates? Let us check.

``````# Create a new vector using if_else() based on the vector created earlier. View it.
dplyr_datevec = if_else(datevec < as.Date("2022-01-03"), datevec, as.Date(NA))
dplyr_datevec
``````

Output:

``````> dplyr_datevec = if_else(datevec < as.Date("2022-01-03"), datevec, as.Date(NA))
> dplyr_datevec
[1] "2022-01-01" "2022-01-02" NA           NA           NA
``````

We find that `dplyr` and `if_else()` function works correctly on dates.

## Use Multiple Conditions in the `if_else()` Function in R

We can combine multiple conditions using the vectorized `&` and `|` operators, representing `AND` and `OR`.

These can be used in both `ifelse()` and `if_else()`. In our example, we will use `if_else()` because it is the better one.

``````# Create a data frame from the same two vectors.
mult_df = data.frame(Col1, Col2)

# Create a new column based on multiple conditions combined with AND, using &.
mult_df\$AND_Col = if_else((Col1 == "A" & Col2 == "y"), "AND", "F")

# View the data frame with the added column.
mult_df

# Create another column based on multiple conditions combined with Or, using |.
mult_df\$OR_Col = if_else((Col1 == "A" | Col2 == "y"), "OR", "F")

# View the data frame with the added column.
mult_df
``````

The output of the last command:

``````> mult_df
Col1 Col2 AND_Col OR_Col
1    A    x       F     OR
2    A    x       F     OR
3    B    x       F      F
4    B    x       F      F
5    A    y     AND     OR
6    A    y     AND     OR
7    B    y       F     OR
8    B    y       F     OR
``````

Remember that R has vectorized and non-vectorized versions of the `AND` and `OR` operators. We used the vectorized `&` and `|` operators to combine two conditions because we wanted to test the conditions for each row.

The `&` and `|` are vectorized; `&&` and `||` are non-vectorized.

References and Help:

In R Studio, for more information about the `if` statement, `ifelse()` function or `if_else()` function, click `Help > Search R Help` and type the statement/function name in the search box without parentheses.

Alternately, type a question mark followed by the statement/function name at the command prompt in the R Console.

## Conclusion

Statements, functions and operators that work with a single variable may not work with data frames. We need to use the appropriate tools for the task.

To create/update a column of a data frame conditionally, we used the vectorized `ifelse()` function and its better `dplyr` version, `if_else()`.

We used the vectorized `AND` and `OR` operators to combine multiple conditions, `&` and `|`.

Author: Jesse John

Jesse is passionate about data analysis and visualization. He uses the R statistical programming language for all aspects of his work.