This tutorial article will introduce how to convert
bytes to string in Python 2.x and Python 3.x.
Convert Bytes to String in Python 2.x
bytes in Python 2.7 is identical to
str, therefore the variable initiated as
bytes is the string intrinsically.
Python 2.7.10 (default, May 23 2015, 09:44:00) [MSC v.1500 64 bit (AMD64)] on win32 Type "copyright", "credits" or "license()" for more information. >>> A = b'cd' >>> A 'cd' >>> type(A) <type 'str'>
Convert Bytes to String in Python 3.x
bytes is a new data type introduced in Python 3.
Python 3.6.3 (v3.6.3:2c5fed8, Oct 3 2017, 18:11:49) [MSC v.1900 64 bit (AMD64)] on win32 Type "copyright", "credits" or "license()" for more information. >>> A = b'cd' >>> A b'cd' >>> type(A) <class 'bytes'> >>>
The data type of elements in the
>>> A = b'cd' >>> A 99 >>> type(A) <class 'int'>
Convert Bytes to String by using
decode in Python 3.x
.decode method of
bytes could convert bytes to string with the given
encoding method. It is OK in most cases if you leave the
encoding method as default
utf-8, but it is not always safe becasue the bytes could be encoded with other encoding method rather than
>>> b'\x50\x51'.decode() 'PQ' >>> b'\x50\x51'.decode('utf-8') 'PQ' >>> b'\x50\x51'.decode(encoding = 'utf-8') 'PQ'
The three ways to decode the
bytes as shown above are identical because
utf-8 is used as the encoding method.
It could raise errors when
utf-8 is used but the bytes are not encoded with it.
>>> b'\x50\x51\xffed'.decode('utf-8') Traceback (most recent call last): File "<pyshell#16>", line 1, in <module> b'\x50\x51\xffed'.decode('utf-8') UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 2: invalid start byte
We get the
UnicodeDecodeError that says
utf-8 is not the right
We have two approaches to solve this
replace as parameters to
decode has the other parameter besides
errors. It defines the behavior when an
error happens. The default value of
strict that means it raises an error if the error happens in de decoding process.
error has other options like
replace or other registered
backslashreplace for example.
ignore ignores the wrong decoding errors and creates the output string as it can.
replace replaces the corresponding characters with the characters as defined in the
encoding method as given.
backslashreplace replaces the characters that couldn’t be decoded with the same content as in the original
>>> b'\x50\x51\xffed'.decode('utf-8', 'backslashreplace') 'PQ\\xffed' >>> b'\x50\x51\xffed'.decode('utf-8', 'ignore') 'PQed' >>> b'\x50\x51\xffed'.decode('utf-8', 'replace') 'PQ�ed'
cp437 encoding could be used if the encoding of the
bytes data is unknown.
>>> b'\x50\x51\xffed'.decode('cp437') 'PQ\xa0ed'
chr to convert the bytes to string in Python 3.x
chr(i, /) returns a Unicode string of one character with ordinal. It could convert the element of
bytes to a
string but not the complete
We could use list comprehension or
map to get the converted string of
bytes while employing
chr for individual element.
>>> A = b'\x50\x51\x52\x53' >>> "".join([chr(_) for _ in A]) 'PQRS' >>> "".join(map(chr, A)) 'PQRS'
Performance comparsion and conclusion of different methods of convering bytes to string
timeit to compare the performance of method introduced in this tutorial -
>>> import timeit >>> timeit.timeit('b"\x50\x51\x52\x53".decode()', number=1000000) 0.1356779 >>> timeit.timeit('"".join(map(chr, b"\x50\x51\x52\x53"))', number=1000000) 0.8295201999999975 >>> timeit.timeit('"".join([chr(_) for _ in b"\x50\x51\x52\x53"])', number=1000000) 0.9530071000000362
You could see from the time performance shown above,
decode() is much faster and
chr() is relatively inefficient because it needs to reconstruct the string from the single string character.
We recommend using
decode in the performance-critical application.