How to Convert Bytes to String in Python 2 and Python 3

This tutorial article will introduce how to convert bytes to string in Python 2.x and Python 3.x.

Convert Bytes to String in Python 2.x

bytes in Python 2.7 is identical to str, therefore the variable initiated as bytes is the string intrinsically.

Python 2.7.10 (default, May 23 2015, 09:44:00) [MSC v.1500 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> A = b'cd'
>>> A
>>> type(A)
<type 'str'>

Convert Bytes to String in Python 3.x

bytes is a new data type introduced in Python 3.

Python 3.6.3 (v3.6.3:2c5fed8, Oct  3 2017, 18:11:49) [MSC v.1900 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> A = b'cd'
>>> A
>>> type(A)
<class 'bytes'>

The data type of elements in the bytes is int.

>>> A = b'cd'
>>> A[0]
>>> type(A[0])
<class 'int'>

Convert Bytes to String by using decode in Python 3.x

.decode method of bytes could convert bytes to string with the given encoding method. It is OK in most cases if you leave the encoding method as default utf-8, but it is not always safe becasue the bytes could be encoded with other encoding method rather than utf-8.

>>> b'\x50\x51'.decode()
>>> b'\x50\x51'.decode('utf-8')
>>> b'\x50\x51'.decode(encoding = 'utf-8')

The three ways to decode the bytes as shown above are identical because utf-8 is used as the encoding method.

It could raise errors when utf-8 is used but the bytes are not encoded with it.

>>> b'\x50\x51\xffed'.decode('utf-8')
Traceback (most recent call last):
  File "<pyshell#16>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 2: invalid start byte

We get the UnicodeDecodeError that says utf-8 is not the right codec.

We have two approaches to solve this encoding issues.

backslashreplace, ignore or replace as parameters to errors

decode has the other parameter besides encoding - errors. It defines the behavior when an error happens. The default value of errors is strict that means it raises an error if the error happens in de decoding process.

error has other options like ignore, replace or other registered codecs.register_error names, backslashreplace for example.

ignore ignores the wrong decoding errors and creates the output string as it can.

replace replaces the corresponding characters with the characters as defined in the encoding method as given.backslashreplace replaces the characters that couldn’t be decoded with the same content as in the original bytes.

>>> b'\x50\x51\xffed'.decode('utf-8', 'backslashreplace')
>>> b'\x50\x51\xffed'.decode('utf-8', 'ignore')
>>> b'\x50\x51\xffed'.decode('utf-8', 'replace')

MS-DOS cp437 encoding could be used if the encoding of the bytes data is unknown.

>>> b'\x50\x51\xffed'.decode('cp437')

chr to convert the bytes to string in Python 3.x

chr(i, /) returns a Unicode string of one character with ordinal. It could convert the element of bytes to a string but not the complete bytes.

We could use list comprehension or map to get the converted string of bytes while employing chr for individual element.

>>> A =  b'\x50\x51\x52\x53'
>>> "".join([chr(_) for _ in A])
>>> "".join(map(chr, A))

Performance comparsion and conclusion of different methods of convering bytes to string

We use timeit to compare the performance of method introduced in this tutorial - decode and chr.

>>> import timeit
>>> timeit.timeit('b"\x50\x51\x52\x53".decode()', number=1000000)
>>> timeit.timeit('"".join(map(chr, b"\x50\x51\x52\x53"))', number=1000000)
>>> timeit.timeit('"".join([chr(_) for _ in b"\x50\x51\x52\x53"])', number=1000000)

You could see from the time performance shown above, decode() is much faster and chr() is relatively inefficient because it needs to reconstruct the string from the single string character.

We recommend using decode in the performance-critical application.

Related Articles - Python Bytes

  • How to Convert Bytes to Integers
  • How to Convert Int to Bytes in Python 2 and Python 3
  • How to Convert String to Bytes in Python
  • Related Articles - Python Encoding-Decoding

  • How to Convert Bytes to Integers
  • How to Convert Int to Bytes in Python 2 and Python 3
  • How to Convert String to Bytes in Python
  • comments powered by Disqus