How to Extract Domain From URL in Python

Naila Saad Siddiqui Feb 02, 2024
How to Extract Domain From URL in Python

This article will use practical examples to explain Python’s urlparse() function to parse and extract the domain name from a URL. We’ll also discuss improving our ability to resolve URLs and use their different components.

Use urlparse() to Extract Domain From the URL

The urlparse() method is part of Python’s urllib module, useful when you need to split the URLs into different components and use them for various purposes. Let us look at the example:

from urllib.parse import urlparse

component = urlparse("http://www.google.com/doodles/mothers-day-2021-april-07")
print(component)

In this code snippet, we have first included the library files from the urllib module. Then we passed a URL to the urlparse function. The return value of this function is an object that acts like an array having six elements that are listed below:

  • scheme - Specify the protocol we can use to get the online resources, for instance, HTTP/HTTPS.
  • netloc - net means network and loc means location; so it means URLs’ network location.
  • path - A specific pathway a web browser uses to access the provided resources.
  • params - These are the path elements’ parameters.
  • query - Adheres to the path component & the data’s steam that a resource can use.
  • fragment - It classifies the part.

When we display this object using the print function, it will print its components’ value. The output of the above code fence will be as follows:

ParseResult(scheme='http', netloc='www.google.com', path='/doodles/mothers-day-2021-april-07', params='', query='', fragment='')

You can see from the output that all the URL components are separated and stored as individual elements in the object. We can get the value of any component by using its name like this:

from urllib.parse import urlparse

domain_name = urlparse("http://www.google.com/doodles/mothers-day-2021-april-07").netloc
print(domain_name)

Using the netloc component, we can get the domain name of the URL as follows:

www.google.com

This way, we can get our URL parsed and use its different components for various purposes in our programming.

Related Article - Python URL