Python Checksum

Python Checksum

  1. Use the hashlib.md5() Function to Generate and Check the checksum of an MD5 File in Python
  2. Use the os Module to Generate and Check the checksum of an MD5 File in Python

When it comes to any successful and popular programming language, hashing proves to be an essential part of it. One such component of hashing is really important and significantly relevant in day-to-day programming, called checksum.

This article will discuss Checksum and how to generate it for an MD5 file.

Checksums are utilized in Python for the purpose of error detection in a file. They have the basic task of validating the data in a given file. Although pretty similar, it differs from the in-built hashing provided by Python in a way as it is deterministic.

Use the hashlib.md5() Function to Generate and Check the checksum of an MD5 File in Python

The hashlib module is utilized to implement a common interface for several different message digest and secure hash algorithms. In order to implement this method successfully, we need to import the hashlib module to the Python code.

Here, we will majorly use the hashlib.md5() function, along with the update() and the hexdigest() function to update and return a hexadecimal value respectively.

The following code uses the hashlib.md5() function to generate and check the checksum of an MD5 file in Python.

import hashlib
def md5(file1):
    md5h = hashlib.md5()
    with open(file1, "rb") as f:
        for chunk in iter(lambda: f.read(4096), b""):
            md5h.update(chunk)
    return md5h.hexdigest()

We should note that this code will return a hex string that represents the given digest. Using just the digest() function will return the packed bytes. The programmer can use any of these functions keeping in mind their desired output.

Use the os Module to Generate and Check the checksum of an MD5 File in Python

The os module in Python provides access to several functions that help in interacting with the operating system. The os module is vast and contains several modules, all of which are used to fulfill different purposes.

Here, we download a random image and then compute its checksum for MD5, using a python function that we create, and then we finally compare it to the checksum generated with the help of Unix commands.

The function definition for the get_checksum() function:

import hashlib 
def get_checksum(filename, hash_function):
    """Generate checksum for file based on hash function (MD5).
 
    Args:
        filename (str): Path to file that will have the checksum generated.
        hash_function (str):  Hash function name - supports MD5
 
    Returns:
        str`: Checksum based on Hash function of choice.
 
    Raises:
        Exception: Invalid hash function is entered.
 
    """
    hash_function = hash_function.lower()
 
    with open(filename, "rb") as f:
        bytes = f.read() 
        if hash_function == "md5":
            readable_hash = hashlib.md5(bytes).hexdigest()
        
        else:
            Raise("{} is an invalid hash function. Please Enter MD5 value")
 
    return readable_hash

The following code uses the get_checksum() function defined above along with the os module to generate and check the checksum of an MD5 file in Python.

import os
pic = "g_circle-300x300.png"
resmd5 = get_checksum(pic, "md5")
os.system("md5 {}".format(pic))
print('Hash Function: MD5 - Filename: {}'.format(resmd5))

Although MD5 is widely used, it has lately been discovered to be broken and contains a lot of flaws. The process of generating and checking the checksum of a file can be very risky, and it is not recommended to use an MD5 file for this purpose.

Moreover, an MD5 file is not really your best if you need something cryptographically secured as it is not just up to the mark when it comes to that.