Raw String and Unicode String in Python

Neema Muganga Oct 10, 2023
  1. Raw String in Python
  2. Python Unicode String
Raw String and Unicode String in Python

Raw String in Python

Raw string literals in Python define normal strings that are prefixed with either an r or R before the opening quote. If a backslash (\) is in the string, the raw string treats this character as a literal character but not an escape character.

For example,

print(r"\n")
print(r"\t")

Output:

\n
\t

It is required to double every backslash when defining a string so that it is not mistaken as the beginning of an escape sequence like a new-line, or the new-tab. We see such syntax application in the syntax of regular expressions and when expressing Windows file paths.

Note
r'\' will raise a syntax error because r treats the backslash as a literal. Without the r prefix, the backslash is treated as an escape character.

Example:

text = "Hello\nWorld"
print(text)

Output:

Hello
World

Without the raw string flag r, the backslash is treated as an escape character, so when the above string is printed, the new line escape sequence is generated. Hence the two strings in the text are printed out on separate lines, as displayed in the output.

Using the same text example, add the r prefix before the string.

Example:

text = r"Hello\nWorld"
print(text)

Output:

Hello\nWorld

From the output, the raw string flag treats the backslash as a literal and prints out the text with the backslash included. So, the input and output are both the same because the backslash character is not escaped.

For instance, '\\n' and r'\n' have the same value.

print("\\n")
print(r"\n")

Python Unicode String

Unicode is one way of storing python strings. Unicode can store strings from all language types. The second way is the ASCII type of string storage represented as str in Python. str is the default data type to store strings in Python.

To convert a string to Unicode type, put a u before the text like this - u'string' or call the unicode() function like this - unicode('string').

u'text' is a Unicode string while text is a byte string. A Unicode object takes more memory space.

For example,

test = u"一二三"
print(test)

Output:

一二三

Related Article - Python String