Convert String to Unicode in Python

Convert String to Unicode in Python

  1. Convert Strings to Unicode in Python 2
  2. Convert Strings to Unicode Format in Python 3

This tutorial will discuss converting regular strings into Unicode strings in Python.

Convert Strings to Unicode in Python 2

In Python 2, regular strings are known as byte strings and we can use the built-in unicode() function to convert these byte strings into a Unicode string. This code snippet shows us how we can convert a regular string into a Unicode string in Python 2.

regular = "regular string"
unicode_string = unicode(regular, "utf-8")
print(type(regular))
print(type(unicode_string))

Output:

<type 'str'>
<type 'unicode'>

We converted the regular byte string into a Unicode string with the unicode() function in Python 2.

Convert Strings to Unicode Format in Python 3

In Python 3, strings are Unicode strings by default and there’s no method for us to convert a regular string into a Unicode string. Hence, the following code gives different results on Python 2 and Python 3.

regular = "regular string"
unicode_string = u"Unicode string"
print(type(regular))
print(type(unicode_string))

Python 2 Output:

<type 'str'>
<type 'unicode'>

Python 3 Output:

<class 'str'>
<class 'str'>

In the code above, we initialize a Unicode string in both Python 2 and Python 3. In Python 2, the string belongs to the class unicode because there’s a difference between regular strings and Unicode strings, whereas, in Python 3, the string belongs to the class str. After all, Unicode strings are the same as regular strings.

Muhammad Maisam Abbas avatar Muhammad Maisam Abbas avatar

Maisam is a highly skilled and motivated Data Scientist. He has over 4 years of experience with Python programming language. He loves solving complex problems and sharing his results on the internet.

LinkedIn