Split String by Delimiter in R

Split String by Delimiter in R

  1. Use strsplit to Split String by Delimiter in R
  2. Use str_split to Split String by Delimiter in R

This article will discuss how to split string by delimiter in R.

Use strsplit to Split String by Delimiter in R

strsplit is provided with the R base library and should be available on most installations without additional packages. strsplit splits character vector into sub-strings by the given delimiter, which is provided with a character vector as well. The first argument of the function is the character vector to be split up. In this case, we specify the space character to separate each word in the given sentence. Note that output is given as a list of character vectors.

library(dplyr)
library(stringr)

str <- "Lorem Ipsum is simply dummied text of the printing and typesetting industry."

strsplit(str, " ")

Output:

> strsplit(str, " ")
[[1]]
 [1] "Lorem"       "Ipsum"       "is"          "simply"      "dummied"       "text"       
 [7] "of"          "the"         "printing"    "and"         "typesetting" "industry."  

Use str_split to Split String by Delimiter in R

Alternatively, the str_split function can also be utilized to split string by delimiter. str_split is part of the stringr package. It almost works in the same way as strsplit does, except that str_split also takes regular expressions as the pattern. In the following example, we only pass the fixed string to match. Note that the function can optionally take the third argument, which denotes the number of substrings to return.

library(dplyr)
library(stringr)

str <- "Lorem Ipsum is simply dummied text of the printing and typesetting industry."

str_split(str, " ")

Output:

> str_split(str, " ")
[[1]]
 [1] "Lorem"       "Ipsum"       "is"          "simply"      "dummied"       "text"       
 [7] "of"          "the"         "printing"    "and"         "typesetting" "industry."  

Another optional parameter in the str_split function is simplify, which comes at fourth place. This parameter has the value of FALSE by default, and this forces the function to return sub-strings as a list of character vectors. If we assign TRUE to the given argument, str_split returns a character matrix.

library(dplyr)
library(stringr)

fruits <- c(
  "apples and oranges and pears and bananas",
  "pineapples and mangos and raspberries"
)

str_split(fruits, " and ")
str_split(fruits, " and ", simplify = TRUE)

Output:

> str_split(fruits, " and ")
[[1]]
[1] "apples"  "oranges" "pears"   "bananas"

[[2]]
[1] "pineapples"  "mangos"      "raspberries"


> str_split(fruits, " and ", simplify = TRUE)
     [,1]         [,2]      [,3]          [,4]     
[1,] "apples"     "oranges" "pears"       "bananas"
[2,] "pineapples" "mangos"  "raspberries" ""

Related Article - R String

  • Paste Strings Without Spaces in R
  • Concatenate Strings in R
  • Convert Strings to Lower Case in R
  • Remove the First Character From a String in R
  • Remove Last Character From String in R