How to Split String by Delimiter in R

Jinku Hu Feb 02, 2024
  1. Use strsplit to Split String by Delimiter in R
  2. Use str_split to Split String by Delimiter in R
How to Split String by Delimiter in R

This article will discuss how to split string by delimiter in R.

Use strsplit to Split String by Delimiter in R

strsplit is provided with the R base library and should be available on most installations without additional packages. strsplit splits character vector into sub-strings by the given delimiter, which is provided with a character vector as well. The first argument of the function is the character vector to be split up. In this case, we specify the space character to separate each word in the given sentence. Note that output is given as a list of character vectors.

library(dplyr)
library(stringr)

str <- "Lorem Ipsum is simply dummied text of the printing and typesetting industry."

strsplit(str, " ")

Output:

> strsplit(str, " ")
[[1]]
 [1] "Lorem"       "Ipsum"       "is"          "simply"      "dummied"       "text"       
 [7] "of"          "the"         "printing"    "and"         "typesetting" "industry."  

Use str_split to Split String by Delimiter in R

Alternatively, the str_split function can also be utilized to split string by delimiter. str_split is part of the stringr package. It almost works in the same way as strsplit does, except that str_split also takes regular expressions as the pattern. In the following example, we only pass the fixed string to match. Note that the function can optionally take the third argument, which denotes the number of substrings to return.

library(dplyr)
library(stringr)

str <- "Lorem Ipsum is simply dummied text of the printing and typesetting industry."

str_split(str, " ")

Output:

> str_split(str, " ")
[[1]]
 [1] "Lorem"       "Ipsum"       "is"          "simply"      "dummied"       "text"       
 [7] "of"          "the"         "printing"    "and"         "typesetting" "industry."  

Another optional parameter in the str_split function is simplify, which comes at fourth place. This parameter has the value of FALSE by default, and this forces the function to return sub-strings as a list of character vectors. If we assign TRUE to the given argument, str_split returns a character matrix.

library(dplyr)
library(stringr)

fruits <- c(
  "apples and oranges and pears and bananas",
  "pineapples and mangos and raspberries"
)

str_split(fruits, " and ")
str_split(fruits, " and ", simplify = TRUE)

Output:

> str_split(fruits, " and ")
[[1]]
[1] "apples"  "oranges" "pears"   "bananas"

[[2]]
[1] "pineapples"  "mangos"      "raspberries"


> str_split(fruits, " and ", simplify = TRUE)
     [,1]         [,2]      [,3]          [,4]     
[1,] "apples"     "oranges" "pears"       "bananas"
[2,] "pineapples" "mangos"  "raspberries" ""
Author: Jinku Hu
Jinku Hu avatar Jinku Hu avatar

Founder of DelftStack.com. Jinku has worked in the robotics and automotive industries for over 8 years. He sharpened his coding skills when he needed to do the automatic testing, data collection from remote servers and report creation from the endurance test. He is from an electrical/electronics engineering background but has expanded his interest to embedded electronics, embedded programming and front-/back-end programming.

LinkedIn Facebook

Related Article - R String