Decode URL in Python

Decode URL in Python

  1. Decode URL Using the Urllib Library in Python
  2. Decode URL Using the Requests Library in Python
  3. Encode and Decode Unicode Encoded URL String Using Utf-8 in Python
  4. Decode URL String Using the Unquote and Unescape Libraries in Python
  5. Conclusion

This article demonstrates decoding string URLs using four different methods in Python.

URLs are encoded by following a particular character specification to make their interpretation easy. It is done by turning the special characters inside the URL into US-ASCII characters.

Turning the encoded URL string back to its original form is URL decoding.

Decode URL Using the Urllib Library in Python

The following are the steps to decode URL strings using the urllib library in Python.

  1. Import the library package urllib and the subpackage unquote.

    Syntax:

    from urllib.parse import unquote
    
  2. The URL which needs to be decoded is saved inside the variable a.

  3. The function unquote(url) decodes the URL string saved inside that variable. Then the decoded string URL is assigned inside the variable clean_url.

  4. Finally, we print clean_url to view the decoded URL string.

Code:

from urllib.parse import unquote
a = 'https%3A//www.google.com/search%3Fclient%3Dfirefox-b-d%26q%3Durlib'
clean_url = unquote(a)
print(clean_url)

Output:

"C:\Users\Win 10\main.py"
https://www.google.com/search?client=firefox-b-d&q=urlib

Process finished with exit code 0

Decode URL Using the Requests Library in Python

This example demonstrates decoding huge URL strings using the Python library package - requests. The program takes a simple approach to decode the data directly at the print statement.

These are the steps to decode URL strings utilizing Python requests:

  1. Import the Python library package requests.

  2. The URL is saved inside the variable url. To make the whole URL viewable, we use whitespaces, slash (\ ), and double-quotes to divide it.

    When segregated this way, the URL gets divided into multiple lines of code. But during compilation, the program reads it as a whole URL string.

  3. Give two print statements. The first one displays the original encoded URL by printing the variable url.

  4. Inside the second print statement, the URL is decoded through the syntax requests.utils.unquote(url), and the final result is printed.

Code:

import requests

url = "https%3A//www.google.com/search%3Fclient%3Dfirefox-b-d%26s" \
      "xsrf%3DAPq-WBv9aDXZv8lI5HNFhawgmJv12E1J1g%3A1649535122670%26q" \
      "%3Dwww.python.org%2Bdownload%26sa%3DX%26ved%3D2ahUKEwjN3Z-Y5Yf3" \
      "AhWRF4gKHbfRB90Q1QJ6BAgyEAE%26biw%3D1366%26bih%3D643%26dpr%3D1"

print(f"Before: {url}")
print(f"After:  {requests.utils.unquote(url)}")

Output:

"C:\Users\Win 10\main.py"
Before: https%3A//www.google.com/search%3Fclient%3Dfirefox-b-d%26sxsrf%3DAPq-WBv9aDXZv8lI5HNFhawgmJv12E1J1g%3A1649535122670%26q%3Dwww.python.org%2Bdownload%26sa%3DX%26ved%3D2ahUKEwjN3Z-Y5Yf3AhWRF4gKHbfRB90Q1QJ6BAgyEAE%26biw%3D1366%26bih%3D643%26dpr%3D1

After:  https://www.google.com/search?client=firefox-b-d&sxsrf=APq-WBv9aDXZv8lI5HNFhawgmJv12E1J1g:1649535122670&q=www.python.org+download&sa=X&ved=2ahUKEwjN3Z-Y5Yf3AhWRF4gKHbfRB90Q1QJ6BAgyEAE&biw=1366&bih=643&dpr=1

Process finished with exit code 0

Looking closer, one can see that the decoded URL is a little short of its original one.

Encode and Decode Unicode Encoded URL String Using Utf-8 in Python

The first example demonstrates decoding a Unicode encoded string by encoding it first using the UTF-8 method.

Decode Unicode Encoded Plain String in Python

Here, the first input given is a Unicode encoded string that cannot be decoded directly, so it needs to be UTF-8 encoded before proceeding further.

  1. Import the Python library package urllib.parse. Note that importing parse along with urllib is necessary.

  2. The string must be saved inside the variable u and encoded.

    Syntax:

    urllib.parse.quote(variable_name.encode('utf8'))
    

    The result is saved inside a new variable url so that it can be used as input while decoding.

  3. The variable url is printed to view the encoded result.

The steps below demonstrate taking the encoded string and decoding it using unquote.

  1. A variable f is initialized to decode and store the result.
  2. The syntax urllib.parse.unquote(url) decodes the string stored inside the variable url and saves it into the variable f.
  3. The variable f is printed to view the decoded string URL.

Code:

import urllib.parse

u = "Tan\u0131m"
url = urllib.parse.quote(u.encode('utf8'))
print(url)

f = urllib.parse.unquote(url)
print(f)

Output:

"C:\Users\Win 10\main.py"
Tan%C4%B1m
Tanım

Process finished with exit code 0

Decode Unicode Encoded URL String in Python

In some scenarios, URLs are encoded using the Unicode format. Decoding Unicode encoded string URLs is a complex job as not many tools are available for this purpose.

A user might have to create a decoder on its own to decode Unicode encoded string URLs. A turnaround to this problem is implementing the above method to Unicode URLs.

When the above method is applied, Unicode URLs are first encoded using the UTF-8 format, and then the bytes are % escaped from it, resulting in a decoded URL string.

Code:

import urllib.parse

u = '%u05D0%u05D9%u05DA%20%u05DE%u05DE%u05D9%u05E8%u05D9%u05DD%20%u05D0%u05EA%20%u05' \
    'D4%u05D8%u05E7%u05E1%u05D8%20%u05D4%u05D6%u05D4'

url = urllib.parse.quote(u.encode('utf8'))
# print(url)

f = urllib.parse.unquote(url)
print(f)

Output:

"C:\Users\Win 10\main.py"
%u05D0%u05D9%u05DA%20%u05DE%u05DE%u05D9%u05E8%u05D9%u05DD%20%u05D0%u05EA%20%u05D4%u05D8%u05E7%u05E1%u05D8%20%u05D4%u05D6%u05D4

Process finished with exit code 0

Decode URL String Using the Unquote and Unescape Libraries in Python

The program below decodes the URL string using the Python library packages urllib and html with their sub-packages requests and unquote and unescape, respectively.

  1. Import the necessary libraries - urllib and html.

    Syntax to import sub-packages:

    from urllib.request import unquote
    from urllib.request import unquote
    
  2. The URL to be decoded is stored inside the variable f.

  3. The string URL is decoded using the syntax (unescape(unquote(url))).

  4. The above syntax is put inside a print statement to print the final result.

Code:

from urllib.request import unquote
from html import unescape
f = ('https://v.w.xy/p1/p22?userId=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx&'
                 'confirmationToken=7uAf%2fxJoxRTFAZdxslCn2uwVR9vV7cYrlHs%2fl9sU%2frix9f9C'
                 'nVx8uUT%2bu8y1%2fWCs99INKDnfA2ayhGP1ZD0z%2bodXjK9xL5I4gjKR2xp7p8Sckvb04mddf'
                 '%2fiG75QYiRevgqdMnvd9N5VZp2ksBc83lDg7%2fgxqIwktteSI9RA3Ux9VIiNxx%2fZLe9dZSHxRq9AA')

print(unescape(unquote(f)))

Output:

"C:\Users\Win 10\main.py"
https://v.w.xy/p1/p22?userId=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx&confirmationToken=7uAf/xJoxRTFAZdxslCn2uwVR9vV7cYrlHs/l9sU/rix9f9CnVx8uUT+u8y1/WCs99INKDnfA2ayhGP1ZD0z+odXjK9xL5I4gjKR2xp7p8Sckvb04mddf/iG75QYiRevgqdMnvd9N5VZp2ksBc83lDg7/gxqIwktteSI9RA3Ux9VIiNxx/ZLe9dZSHxRq9AA

Process finished with exit code 0

Conclusion

This article demonstrates multiple methods to decode URL strings. After going through this article, the reader can easily implement URL string decoding through different Python library packages.

You can use any discussed methods to decode a given URL in Python, as all approaches are concise and semantically equivalent.

Related Article - Python Decoding

  • Decode UTF-8 in Python