How to Filter a Data Frame With Multiple Conditions in R

Jesse John Feb 02, 2024
  1. Filter a Data Frame With Multiple Conditions in R
  2. Use of Boolean Operators
  3. Order of Precedence in Evaluation of Expressions
  4. Specify Desired Combinations Using Parentheses
  5. Use the %in% Operator
  6. Reference
How to Filter a Data Frame With Multiple Conditions in R

Filtering the rows of a data frame is a common step in data analysis. In many cases, we need to restrict the rows of the result to those that meet more than one condition.

When such multiple conditions are involved, we need to combine our knowledge of several pieces of information to get the desired results. These pertain to:

  1. Use of Boolean operators.
  2. Order of precedence in the evaluation of expressions.
  3. Use of parentheses to specify the desired order of evaluation.

Filter a Data Frame With Multiple Conditions in R

To begin, we will create a sample data frame for this article. We will also load the dplyr package to use its filter() function for our demonstrations.

Example Code:

# Create a data frame for the article.
SN = 1:7
Alph = c("M", "M", "N", "N", "K", "K", "M")
Nmbr = c(11, 22, 11, 22, 11, 22, 0)
tb = data.frame(SN, Alph, Nmbr)
tb

# Load the dplyr package.
library(dplyr)

Output:

> tb
  SN Alph Nmbr
1  1    M   11
2  2    M   22
3  3    N   11
4  4    N   22
5  5    K   11
6  6    K   22
7  7    M    0

Use of Boolean Operators

We will use Boolean operators to combine multiple conditions.

  1. The AND operator, &, returns TRUE only when both the conditions on either side evaluate to TRUE.
  2. The OR operator, |, returns TRUE if any of the conditions on either side or even both evaluate to TRUE.
  3. The NOT operator, !, converts FALSE conditions to TRUE and vice-versa.

We will use relational operators to specify conditions. To compare equality, the operator is ==.

Example Code:

# Filter rows where Alph is "M" OR "N".
tb %>% filter(Alph=="M" | Alph == "N")

# Filter rows where Alph is "M" AND Nmbr == 22.
tb %>% filter(Alph=="M" & Nmbr == 22)

# Filter rows where Alph is NOT "M".
tb %>% filter(!Alph=="M")

Output:

> # Filter rows where Alph is "M" OR "N".
> tb %>% filter(Alph=="M" | Alph == "N")
  SN Alph Nmbr
1  1    M   11
2  2    M   22
3  3    N   11
4  4    N   22
5  7    M    0

> # Filter rows where Alph is "M" AND Nmbr == 22.
> tb %>% filter(Alph=="M" & Nmbr == 22)
  SN Alph Nmbr
1  2    M   22

> # Filter rows where Alph is NOT "M".
> tb %>% filter(!Alph=="M")
  SN Alph Nmbr
1  3    N   11
2  4    N   22
3  5    K   11
4  6    K   22
  1. In the first example, we filtered (selected) rows for which Alph is M or N. Note that we have to specify the full condition on both sides of the OR operator.
  2. In the second example, we filtered rows for which Alph is M and Nmbr is 22 simultaneously.
  3. In the third example, we filtered rows for which the column Alph is not M.

Order of Precedence in Evaluation of Expressions

In the context of combining multiple conditions, we need to note the following order of precedence of operators.

  1. NOT operator, !.
  2. AND operator, &.
  3. OR operator, |.
  4. Operands are grouped from left to right, subject to the above unless overridden by parentheses.

Example Code:

# The ! operator only applies to the Alph column.
tb %>% filter(!Alph=="M" & Nmbr==11)

# The & operator only combines its two neighbors.
tb %>% filter(Alph=="M" & Nmbr==11 | Nmbr==22)

# The | operator gets applied AFTER &.
tb %>% filter(Nmbr==11 | Nmbr==22 & Alph=="M")

Output:

> # The ! operator only applies to the Alph column.
> tb %>% filter(!Alph=="M" & Nmbr==11)
  SN Alph Nmbr
1  3    N   11
2  5    K   11

> # The & operator only combines its two neighbors.
> tb %>% filter(Alph=="M" & Nmbr==11 | Nmbr==22)
  SN Alph Nmbr
1  1    M   11
2  2    M   22
3  4    N   22
4  6    K   22

> # The | operator gets applied AFTER &.
> tb %>% filter(Nmbr==11 | Nmbr==22 & Alph=="M")
  SN Alph Nmbr
1  1    M   11
2  2    M   22
3  3    N   11
4  5    K   11

Because of the order of precedence, we saw that:

  1. The ! operator got applied to Alph before the & operator combined the conditions in the first example.
  2. The & operator combined the two conditions on either side before passing the result to the left side of the | operator.
  3. In the third example, the & got applied first. On the right side of |, we have the result of first applying & to its two operands.

Specify Desired Combinations Using Parentheses

We can use parentheses to group multiple conditions to get the desired result. Using parentheses, we can control the order of operations (precedence).

We will use the same examples in the previous section but modify them using parentheses to get different results.

Example Code:

# The ! operator now applies to the result of the & operation.
tb %>% filter(!(Alph=="M" & Nmbr==11))

# The & operator now has the result of the | operator on its right.
tb %>% filter(Alph=="M" & (Nmbr==11 | Nmbr==22))

# Now, the | operator gets applied first, and the result is passed to &.
tb %>% filter((Nmbr==11 | Nmbr==22) & Alph=="M")

Output:

> # The ! operator now applies to the result of the & operation.
> tb %>% filter(!(Alph=="M" & Nmbr==11))
  SN Alph Nmbr
1  2    M   22
2  3    N   11
3  4    N   22
4  5    K   11
5  6    K   22
6  7    M    0

> # The & operator now has the result of the | operator on its right.
> tb %>% filter(Alph=="M" & (Nmbr==11 | Nmbr==22))
  SN Alph Nmbr
1  1    M   11
2  2    M   22

> # Now, the | operator gets applied first, and the result is passed to &.
> tb %>% filter((Nmbr==11 | Nmbr==22) & Alph=="M")
  SN Alph Nmbr
1  1    M   11
2  2    M   22

Because of the parentheses, we now get different results.

  1. In the first example, the ! operator is now applied after the & operator combined the conditions.

  2. In the second example, the | operation is carried out first. The result is then passed as the operand on the right of &.

  3. In the third example, once again, the | is carried out first. Notice that we get the same result as in the previous example.

    Compare this to the different results for these in the previous section.

Use the %in% Operator

We can use R’s %in% operator to filter rows for which the column value is any of the values mentioned in the vector passed to it. This is the equivalent of combining multiple conditions on the same column using the OR operator.

Example Code:

# Filter rows where Alph is M or K.
tb %>% filter(Alph %in% c("M", "K"))

Output:

> # Filter rows where Alph is M or K.
> tb %>% filter(Alph %in% c("M", "K"))
  SN Alph Nmbr
1  1    M   11
2  2    M   22
3  5    K   11
4  6    K   22
5  7    M    0

Reference

See section 10.4.2 Infix and prefix operators of the R Language Definition for more information about the order of precedence in the evaluation of expressions.

Author: Jesse John
Jesse John avatar Jesse John avatar

Jesse is passionate about data analysis and visualization. He uses the R statistical programming language for all aspects of his work.