requestsModule to Download Files in Python
urllibModule to Download Files in Python
pycurlModule to Download Files in Python
Python is used very frequently to access resources on the internet. We can generate requests and connections using different libraries. Such libraries can also help us in downloading or reading HTTP files from the web.
In this tutorial, we will download files from the internet in Python.
requests Module to Download Files in Python
We can use the
requests module to retrieve information and read web pages from the internet.
get() method helps to retrieve the file’s path from the given URL, from which the file is to be downloaded. The
open() method creates a file object where we wish to save the file, and then the
write() function is used to write the contents of the file to the desired path.
We use these functions to download a file, as shown below.
import requests as req URL = 'https://www.facebook.com/favicon.ico' file = req.get(url, allow_redirects=True) open('facebook.ico', 'wb').write(file.content)
The above code downloads a logo file of Facebook from its URL and stores it in the working directory. We can specify any path in the open() function, but we have to open it in
wb mode. This indicates that we intend to write a file in binary mode.
The above example is suitable for downloading smaller files but does not work efficiently for large files. The
file.content function is used to get the file content as a single string. Since we used a small file in the above example, it worked properly.
If we have to download a big file, then we should use the
file.iter_content() function in which we will be specifying the chunk size. It downloads the data in the form of chunks.
We use this function in the following example.
import requests URL = "http://codex.cs.yale.edu/avi/db-book/db4/slide-dir/ch1-2.pdf" file = requests.get(URL, stream = True) with open("Python.pdf","wb") as pdf: for chunk in file.iter_content(chunk_size=1024): if chunk: pdf.write(chunk)
urllib Module to Download Files in Python
We can also use the
urllib library in Python for downloading and reading files from the web. This is a URL handling module that has different functions to perform the given task.
Here also, we have to specify the URL of the file to be downloaded. The
urllib.request.urlopen() method gets the path of the file and sends a request to the server where the file is being downloaded.
To download files, we can use the
urllib.request.urlretrieve() function. It will download the resource from the given address and store it in the provided directory.
We download the icon of Facebook using this method in the following example.
import urllib urllib.request.urlretrieve("https://www.facebook.com/favicon.ico", "fb.ico")
('fb.ico', <http.client.HTTPMessage at 0x2d2d317a088>)
The above output indicates that the file was downloaded successfully.
pycurl Module to Download Files in Python
We can use file handling with this module to download files from the internet. First, we have to create a file object for where we wish to download the required file. Then, we will use the
pycurl.Curl() function to create an object and initiate the curl session.
setopt() method is used to set the URL value of the file. Next, the
perform() function performs the file transfer process from the server by sending the HTTP request. Next, we will write the data retrieved to the file using the file object. Finally, the
close() method closes the session, and we get our file downloaded in the working directory.
See the code below.
import pycurl file_name = 'fb.ico' file_src = 'https://www.facebook.com/favicon.ico' with open(file_name, 'wb') as f: cl = pycurl.Curl() cl.setopt(cl.URL, file_src) cl.setopt(cl.WRITEDATA, f) cl.perform() cl.close()