How to Decode URL in Python

Vaibhhav Khetarpal Feb 02, 2024
  1. Use the urllib.parse.unquote() Function to Decode a URL in Python
  2. Use the urllib.parse.unquote_plus() Function to Decode a URL in Python
  3. Use the requests Module to Decode a URL in Python
  4. Encode and Decode Unicode Encoded URL String Using UTF-8 in Python
  5. Use the unquote() and unescape() Functions to Decode URL in Python
  6. Conclusion
How to Decode URL in Python

URL encoding is vital for data security when using APIs or transmitting data online. However, there are times when we need to decode these encoded URLs back into plain text.

In this article, we’ll explore various methods for URL decoding in Python, which can be especially helpful when working with web forms.

Use the urllib.parse.unquote() Function to Decode a URL in Python

The urllib.parse.unquote() function efficiently converts a percent-encoded string to plain text. It replaces %x escape sequences with their respective characters, working with byte and str objects.

To utilize this function, import the urllib library. This package provides several libraries and functions that make it easy to work with URLs in Python.

Example Code:

import urllib.parse

url = "delftstack.com/code=%20HOW%20TO%20Articles"
x = urllib.parse.unquote(url)

print(x)

First, we import the urllib.parse module, which provides utilities for working with URLs. Then, define a variable url and assign it a URL string that contains some percent-encoded characters.

Percent-encoded characters in URLs are represented by a '%' followed by two hexadecimal digits that represent the character’s ASCII code.

We use the urllib.parse.unquote() function to decode the url variable. This function takes a percent-encoded URL as input and replaces the encoded characters with their actual character values.

The result of decoding the URL is stored in the variable x. Finally, we print the decoded URL, which will show the original string without percent-encoded characters.

Output:

delftstack.com/code= HOW TO Articles

In the output, the %20 sequences have been replaced with spaces. The other characters remain unchanged as they were not URL-encoded.

Use the urllib.parse.unquote_plus() Function to Decode a URL in Python

In HTML forms, you often encounter + signs when decoding values. Unlike urllib.parse.unquote(), which can’t decode the + signs, the urllib.parse.unquote_plus() function is designed to handle it.

It replaces + signs with spaces. However, this function only works with str objects.

Example Code:

import urllib.parse

url = "delftstack.com/code=HOW%20TO+Articles"
x = urllib.parse.unquote_plus(url)

print(x)

In the code, we import the urllib.parse module. Then, we define a variable url and assign it a URL string, and in this URL, %20 represents a space, and %2B represents a plus sign.

Next, we use the urllib.parse.unquote_plus() function to decode the url variable. This function takes a percent-encoded URL as input and replaces the encoded characters with their actual character values, and it also replaces the '+' character with a space.

The result of decoding the URL is stored in the variable x. Finally, we print the decoded URL.

Output:

delftstack.com/code=HOW TO Articles

Aside from replacing the %20 sequence with a space, the + sign in the original URL has also been replaced with a space.

Use the requests Module to Decode a URL in Python

Python offers a convenient and efficient library called requests for sending HTTP requests within Python. This library can also be valuable for URL decoding tasks, especially when working with HTML forms in Python.

Similar to the urllib.parse.unquote() function, the requests.utils.unquote() function can decode URLs without filtering out the + sign.

Example Code:

import requests

url = "delftstack.com/code=%20HOW%20TO%20Articles"
decoded_url = requests.utils.unquote(url)

print(decoded_url)

First, we import the requests library, which is used for making HTTP requests. Then, define a URL string with some percent-encoded characters.

Next, utilize the requests.utils.unquote() function to decode the URL. This function replaces percent-encoded characters (e.g., '%20') with their actual values.

Lastly, the result of decoding the URL is stored in the variable decoded_url and prints the decoded URL.

Output:

delftstack.com/code= HOW TO Articles

The output displays the decoded URL string named "url". The %20 encodings are replaced with spaces, making the URL more human-readable.

Encode and Decode Unicode Encoded URL String Using UTF-8 in Python

The first example demonstrates decoding a unicode-encoded string by encoding it first using the UTF-8 method.

Decode Unicode Encoded Plain String in Python

Here, the first input given is a unicode-encoded string that cannot be decoded directly, so it needs to be UTF-8 encoded before proceeding further.

  • Import the Python library package urllib.parse. Note that importing parse along with urllib is necessary.
  • The string must be saved inside the variable u and encoded.

    Syntax:

    urllib.parse.quote(variable_name.encode('utf8'))
    

    The result is saved inside a new variable url, so that it can be used as input while decoding.

  • The variable, url, is printed to view the encoded result.

The steps below demonstrate taking the encoded string and decoding it using unquote.

  • A variable f is initialized to decode and store the result.
  • The syntax urllib.parse.unquote(url) decodes the string stored inside the variable url and saves it into the variable f.
  • The variable f is printed to view the decoded string URL.

Example Code:

import urllib.parse

u = "Tan\u0131m"
url = urllib.parse.quote(u.encode("utf8"))
print(url)

f = urllib.parse.unquote(url)
print(f)

Output:

"C:\Users\Win 10\main.py"
Tan%C4%B1m
Tanım

The first line prints the URL-encoded version of "Tanım", which is "Tan%C4%B1m". The second line prints the decoded version of the URL-encoded string, which returns the original string "Tanım" with the non-ASCII character correctly represented.

Decode Unicode Encoded URL String in Python

In some scenarios, URLs are encoded using the Unicode format. Decoding unicode-encoded string URLs is a complex job, as not many tools are available for this purpose.

A user might have to create a decoder on its own to decode unicode-encoded string URLs. A turnaround to this problem is implementing the above method to Unicode URLs.

When the above method is applied, Unicode URLs are first encoded using the UTF-8 format, and then the bytes are % escaped from it, resulting in a decoded URL string.

Example Code:

import urllib.parse

u = (
    "%u05D0%u05D9%u05DA%20%u05DE%u05DE%u05D9%u05E8%u05D9%u05DD%20%u05D0%u05EA%20%u05"
    "D4%u05D8%u05E7%u05E1%u05D8%20%u05D4%u05D6%u05D4"
)

url = urllib.parse.quote(u.encode("utf8"))
f = urllib.parse.unquote(url)
print(f)

In the above example, we import the urllib.parse module, which provides functions for working with URLs. Then, we define a URL-encoded string and store it in the variable 'u'.

Next, encode the URL using urllib.parse.quote() to percent-encode the special characters. We also encode it in UTF-8 before `quoting.

Use urllib.parse.unquote() to decode the URL. Lastly, print the decoded URL.

Output:

"C:\Users\Win 10\main.py"
%u05D0%u05D9%u05DA%20%u05DE%u05DE%u05D9%u05E8%u05D9%u05DD%20%u05D0%u05EA%20%u05D4%u05D8%u05E7%u05E1%u05D8%20%u05D4%u05D6%u05D4

In the output, the urllib.parse.unquote(url) attempts to decode the URL-encoded string back to its original form. However, since the input string was already URL-encoded with %u encoding for Unicode characters, the output retains the same URL-encoded format.

Use the unquote() and unescape() Functions to Decode URL in Python

The following code demonstrates how to decode a URL using Python’s libraries, specifically urllib and html. We’ll use the unquote() function from the urllib.request sub-package to decode the URL and the unescape() function from the html package to handle any HTML escaping.

Example Code:

from urllib.request import unquote
from html import unescape

f = (
    "https://v.w.xy/p1/p22?userId=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx&"
    "confirmationToken=7uAf%2fxJoxRTFAZdxslCn2uwVR9vV7cYrlHs%2fl9sU%2frix9f9C"
    "nVx8uUT%2bu8y1%2fWCs99INKDnfA2ayhGP1ZD0z%2bodXjK9xL5I4gjKR2xp7p8Sckvb04mddf"
    "%2fiG75QYiRevgqdMnvd9N5VZp2ksBc83lDg7%2fgxqIwktteSI9RA3Ux9VIiNxx%2fZLe9dZSHxRq9AA"
)

print(unescape(unquote(f)))

Import the unquote function from the urllib.request module to decode URL-encoded characters and import the unescape function from the html module to decode HTML-encoded entities. Define a URL string and store it in the variable f.

In this line, print(unescape(unquote(f))), we use the unquote() function to decode the URL-encoded characters in the string(f). Use the unescape() function to decode any HTML-encoded entities in the string.

Lastly, print the decoded URL.

Output:

https://v.w.xy/p1/p22?userId=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx&confirmationToken=7uAf/xJoxRTFAZdxslCn2uwVR9vV7cYrlHs/l9sU/rix9f9CnVx8uUT+u8y1/WCs99INKDnfA2ayhGP1ZD0z+odXjK9xL5I4gjKR2xp7p8Sckvb04mddf/iG75QYiRevgqdMnvd9N5VZp2ksBc83lDg7/gxqIwktteSI9RA3Ux9VIiNxx/ZLe9dZSHxRq9AA

In the output, all the URL-encoded characters have been converted to their original characters. Any HTML entities in the URL have been unescaped to their corresponding characters.

This code is useful when working with URLs that may contain both URL-encoded and HTML-escaped elements, ensuring a clean and usable URL for further processing.

Conclusion

Decoding URLs is a crucial skill in web development and data processing. Python offers various methods, each tailored to different scenarios.

Remember to select the method that best suits your unique needs. Whether you’re navigating HTTP requests, managing form data, or handling Unicode-encoded URLs, Python’s flexibility ensures you can decode URLs effectively for your projects.

Vaibhhav Khetarpal avatar Vaibhhav Khetarpal avatar

Vaibhhav is an IT professional who has a strong-hold in Python programming and various projects under his belt. He has an eagerness to discover new things and is a quick learner.

LinkedIn

Related Article - Python URL