Urllib.Error.HTTPError: HTTP Error 403: Forbidden in Python

Urllib.Error.HTTPError: HTTP Error 403: Forbidden in Python

  1. the urllib.error.HTTPError: HTTP Error 403: Forbidden in Python
  2. Fix the urllib.error.HTTPError: HTTP Error 403: Forbidden in Python

Unlike the built-in requests library, the urllib module enables HTTP queries from a website. This reduces interdependence.

We will go through the causes and solutions to urllib.error.HTTPError: HTTP Error 403: Forbidden in the subsequent article.

the urllib.error.HTTPError: HTTP Error 403: Forbidden in Python

When a user tries to visit a restricted page or the page they aren’t permitted to view, a 403 error message appears. The web server utilizes the HTTP status code 403 to indicate the issue on the client or server end.

It happens when the server security denies a request made using the urllib.request module to scrape a webpage. There are several causes for this error.

Code example:

from urllib import request
from urllib.request import Request, urlopen
url = "https://www.google.com/search?q=plants&rlz=1C5CHFA_enPK978PK978&oq=plants&aqs=chrome..69i57j46i67i131i433j0i67i457j46i67j0i67j46i512j0i131i433i512j0i512l2j0i271.3887j0j7&sourceid=chrome&ie=UTF-8"
request_site = Request(url)
webpage = urlopen(request_site).read()
print(webpage)

Output:

urllib.error.HTTPError: HTTP Error 403: Forbidden

This error occurs in the above example code because the page we are accessing is blocked by mod_security. A module that defends websites from outside threats is called mod_security.

It determines whether the queries come from a human or a computer program. It prevents recognized automated tools or agents from attempting to scrape the website by blocking their requests.

mod_security can readily identify it as non-human and blocks it.

Fix the urllib.error.HTTPError: HTTP Error 403: Forbidden in Python

There could be different methods to fix the urllib.error.HTTPError: HTTP Error 403: Forbidden error in Python.

Passing a valid user agent as a header parameter will quickly fix the problem. The website may use cookies as an anti-scraping measure.

The website may set and ask for cookies to be echoed back to prevent scraping, which is maybe against its policy.

from urllib.request import Request, urlopen

def get_page_content(url, head):

  req = Request(url, headers=head)
  return urlopen(req)

url = 'https://example.com'
head = {
  'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36',
  'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
  'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
  'Accept-Encoding': 'none',
  'Accept-Language': 'en-US,en;q=0.8',
  'Connection': 'keep-alive',
  'refere': 'https://example.com',
  'cookie': """your cookie value ( you can get that from your web page) """
}

data = get_page_content(url, head).read()
print(data)

Output:

<!doctype html>\n<html>\n<head>\n    <title>Example Domain</title>\n\n    <meta
'
'
'
<p><a href="https://www.iana.org/domains/example">More information...</a></p>\n</div>\n</body>\n</html>\n'

Passing a valid user agent as a header parameter will quickly fix the problem.

Use Session Object

Sometimes, even using a user agent won’t stop this error from occurring. The Session object of the requests module can then be used.

from random import seed
import requests

url = "https://stackoverflow.com/search?q=html+error+403"
session_obj = requests.Session()
response = session_obj.get(url, headers={"User-Agent": "Mozilla/5.0"})

print(response.status_code)

Output:

200

The above article finds the cause of the urllib.error.HTTPError: HTTP Error 403: Forbidden and the solution to handle it. mod_security basically causes this error as different web pages use different security mechanisms to differentiate between human and automated computers (bots).

Zeeshan Afridi avatar Zeeshan Afridi avatar

Zeeshan is a detail oriented software engineer that helps companies and individuals make their lives and easier with software solutions.

LinkedIn

Related Article - Python Error

  • Python PermissionError: [WinError 5] Access Is Denied
  • Python TypeError: 'DataFrame' Object Is Not Callable
  • Python TypeError: Can't Convert 'List' Object to STR
  • Local Variable Referenced Before Assignment Error in Python
  • Python Handling Socket.Error: [Errno 104] Connection Reset by Peer
  • Python Is Not Recognized in Windows 10
  • Related Article - Python Urllib

  • AttributeError: Module Urllib Has No Attribute Request