Split a String by WhiteSpace in Python

  1. Use the String split() Method to Split a String in Python
  2. Use re.split() to Split a String in Python

This tutorial will demonstrate how to split a string by whitespace as delimiters in Python.

Splitting strings in Python means cutting a single string into an array of strings depending on the delimiter or separator being used.

For example, if a string initialized as Hello, World! I am here. exists, splitting it with whitespace as a delimiter will result in the following output.

['Hello,', 'World!', 'I', 'am', 'here.']

Use the String split() Method to Split a String in Python

The built-in Python string method split() is a perfect solution to split strings using whitespaces. By default, the split() method returns an array of substrings resulting from splitting the original string using whitespace as a delimiter.

For example, let’s use the same string example Hello, World! I am here.. We will use the split() method to separate the string into an array of substrings.

string_list = 'Hello, World! I am here.'.split()

print(string_list)

The output is as expected:

['Hello,', 'World!', 'I', 'am', 'here.']

Besides that, the split() method also automatically removes leading and trailing whitespaces and treats them as a single whitespace delimiter.

Let’s modify the previous example to include random leading, trailing, and consecutive whitespaces.

string_list = '      Hello,   World! I am     here.   '.split()

print(string_list)

Another scenario that the split() method handles automatically are tabs, newlines, and carriage returns denoted with \t, \n, and \r aside from the whitespace literal. The mentioned whitespace formats are also considered delimiters and subjects for trimming.

Output:

['Hello,', 'World!', 'I', 'am', 'here.']

For example:

string_list = ' Hello,   World! I am here.\nI am also\there too,\rand here.'.split()

print(string_list)

Output:

['Hello,', 'World!', 'I', 'am', 'here.', 'I', 'am', 'also', 'here', 'too,', 'and', 'here.']

Considering these factors, you don’t have to worry about explicitly trimming every whitespace before executing the function. This is a handy functionality to have.

Use re.split() to Split a String in Python

The Python RegEx (Regular Expressions) module re also has a pre-defined split() function that we can use in place of the built-in split() method. Although, note that the re.split() is slower compared to the built-in split() method performance-wise.

The re.split() function accepts two main parameters, a RegEx string and the string to perform the split function. The RegEx keyword that represents whitespace is \s. \s is a collation of every type of whitespace, including the ones mentioned above (\n, \t, \r, \f).

For example, declare a string and perform re.split() to split them into an array of substrings. To take trailing and leading whitespaces into consideration, then add a + sign to the RegEx string to match one or more consecutive whitespaces as a single group.

Also, append the keyword r to the RegEx string to ensure that Python processes the escape sequences appropriately.

import re

exStr = "Hello, World!\nWelcome\tto my   tutorial\rarticle."

print(re.split(r'\s+', exStr))

Use re.findall() Instead of re.split() to Split a String in Python

Alternatively, re.findall() can also be used. The findall() function works the complete opposite from split(). This function finds all the substrings that match the given RegEx string, while the split() method uses the RegEx string as a delimiter.

To use the findall() function to split the string using whitespace, negate the whitespace keyword \s by capitalizing the letter (\S). findall() accepts the same parameters as split().

import re

exStr = "Hello, World!\nWelcome\tto my   tutorial\rarticle."

print(re.findall(r'\S+', exStr))

Both functions will produce the same output:

['Hello,', 'World!', 'Welcome', 'to', 'my', 'tutorial', 'article.']

In summary, the best and most optimal way to split a string using whitespaces as a delimiter is the built-in split() method. It’s attached to the string object and considers leading and trailing whitespaces by default. Using this also doesn’t need any knowledge of regular expressions.

Otherwise, the re.split() and re.findall() can be used as substitutes for the split() method, although both functions perform slower than the built-in split() method.

Contribute
DelftStack is a collective effort contributed by software geeks like you. If you like the article and would like to contribute to DelftStack by writing paid articles, you can check the write for us page.

Related Article - Python String

  • Check a String Contains a Number in Python
  • Get a Substring of a String in Python