Match Newline Characters in R Regex
Although regular expressions (
regex for short) are a fairly universal way to define string patterns, those patterns can behave differently on different platforms, particularly when the regex intends to match special characters, such as line breaks. In this article, we will analyze different ways of including line breaks within regular expressions in R.
Newline Sequences in Different Environments
In Linux environments, the pattern
\n is a match for a newline sequence. On Windows, however, the line break matches with
\r\n, and in old Macs, with
If you need a regular expression that matches a newline sequence on any of those platforms, you could use the pattern
\r?\n to match both the
\r\n line termination character sequences.
You will have Linux and Windows environments covered with that option, although the pattern will not match properly line breaks in old Macs. To also cover old Macs, you could use the pattern
\r?\n|\r that also matches with
\r. A more correct version of this pattern would be:
Test Regex With Newline Sequences
Many websites offer the possibility to test regular expressions. Most of them work like Linux environments, finding matches on strings with line feeds when testing the
\n pattern. But they don’t find a match when you test the
\r\n pattern. Examples of these sites are Regex101 and regex tester in extendsclass.
Other testing websites could show different behaviors, just like it happens with different operating environments. For example, Regex Storm works more like Windows platforms, finding matches between strings with line breaks and the pattern