This article will use practical examples to explain Python’s
urlparse() function to parse and extract the domain name from a URL. We’ll also discuss improving our ability to resolve URLs and use their different components.
urlparse() to Extract Domain From the URL
urlparse() method is part of Python’s
urllib module, useful when you need to split the URLs into different components and use them for various purposes. Let us look at the example:
from urllib.parse import urlparse component = urlparse('http://www.google.com/doodles/mothers-day-2021-april-07') print(component)
In this code snippet, we have first included the library files from the
urllib module. Then we passed a URL to the
urlparse function. The return value of this function is an object that acts like an array having six elements that are listed below:
scheme- Specify the protocol we can use to get the online resources, for instance,
netmeans network and
locmeans location; so it means URLs’ network location.
path- A specific pathway a web browser uses to access the provided resources.
params- These are the
query- Adheres to the
pathcomponent & the data’s steam that a resource can use.
fragment- It classifies the part.
When we display this object using the print function, it will print its components’ value. The output of the above code fence will be as follows:
ParseResult(scheme='http', netloc='www.google.com', path='/doodles/mothers-day-2021-april-07', params='', query='', fragment='')
You can see from the output that all the URL components are separated and stored as individual elements in the object. We can get the value of any component by using its name like this:
from urllib.parse import urlparse domain_name = urlparse('http://www.google.com/doodles/mothers-day-2021-april-07').netloc print(domain_name)
netloc component, we can get the domain name of the URL as follows:
This way, we can get our URL parsed and use its different components for various purposes in our programming.