Get Data From a URL in Python

Get Data From a URL in Python

A URL or a Uniform Resource Locator is a valid and unique web address that points to some resource over the internet. This resource can be a simple text file, a zip file, an exe file, a video, an image, or a webpage.

In the case of a webpage, the HTML or the Hypertext Markup Language content is fetched. This article will show how to get this HTML or Hypertext Markup Language data from a URL using Python.

Get Data From a URL Using the requests Module in Python

Python has a requests module that easily sends HTTP (Hypertext Transfer Protocol) requests. This module can be used to fetch the HTML content or any content from a valid URL.

The requests module has a get() method that we can use to fetch data from a URL. This method accepts a url as an argument and returns a requests.Response object.

This requests.Response object contains details about the server’s response to the sent HTTP request. If an invalid URL is passed to this get() method, the get() method will throw a ConnectionError exception.

If you are unsure about the URL’s validity, it is highly recommended to use the try and except blocks. Just enclose the get() method call inside a try and except block. This will be depicted in the upcoming example.

Now, let us understand how to use this function to fetch HTML content or any data from a valid URL. Refer to the following code for the same.

To learn more about the requests.Response object, refer to the official documentation here.

import requests

try:
    url = "https://www.lipsum.com/feed/html"
    r = requests.get(url)
    print("HTML:\n", r.text)
except:
    print("Invalid URL or some error occured while making the GET request to the specified URL")

Output:

HTML:
...

Note that ... represents the HTML content that was fetched from the URL. The HTML content has not been shown in the output above since it was too big.

If the URL is faulty, the above code will run the code inside the except block. The following code depicts how it works.

import requests

try:
    url = "https://www.thisisafaultyurl.com/faulty/url/"
    r = requests.get(url)
    print("HTML:\n", r.text)
except:
    print("Invalid URL or some error occured while making the GET request to the specified URL")

Output:

Invalid URL or some error occurred while making the GET request to the specified URL

Some web pages do not allow GET requests to fetch their content for security purposes. In such cases, we can use the post() method from the requests module.

As the name suggests, this method sends POST requests to a valid URL. This method accepts two arguments, namely, url, and data.

The url is the target URL, and the data accepts a dictionary of header details in the form of key-value pairs. The header details could be an API or Application Programming Interface key, CSRF or Cross-Site Request Forgery token, etc.

The Python code for such a case would be as follows.

import requests

try:
    url = "https://www.thisisaurl.com/that/accepts/post/requests/"
    payload = {
        "api-key": "my-api-key",
        # more key-value pairs
    }
    r = requests.post(url, data = payload)
    print("HTML:\n", r.text)
except:
    print("Invalid URL or some error occured while making the POST request to the specified URL")
Vaibhav Vaibhav avatar Vaibhav Vaibhav avatar

Vaibhav is an artificial intelligence and cloud computing stan. He likes to build end-to-end full-stack web and mobile applications. Besides computer science and technology, he loves playing cricket and badminton, going on bike rides, and doodling.

LinkedIn GitHub

Related Article - Python URL

  • Extract Domain From URL in Python
  • Python Url Decode
  • Using Urlencode in Python