Get File Extension in Python

Get File Extension in Python

Rayven Esplanada Jan-25, 2021 Dec-03, 2020 Python Python File
  1. Use the os.path Module to Extract Extension From File in Python
  2. Use the pathlib Module to Extract Extension From File in Python

This tutorial will introduce how to get the file extension from the filename in Python.

Use the os.path Module to Extract Extension From File in Python

Python has a module os.path that has pre-made useful utility functions to manipulate OS file paths. It includes opening, saving and updating, and getting the information from file paths.

We will use this module to get the file extension in Python.

os.path has a function splitext() to split the root and the extension of the given file path. The function returns a tuple containing the root string and the extension string.

Let’s provide an example file path with a docx extension.

/Users/user/Documents/sampledoc.docx

The output expected should be the extension .docx.

Declare two separate variables to catch the result of splitext() named extension and root.

import os

path = '/Users/user/Documents/sampledoc.docx'
root, extension = os.path.splitext(path)

print('Root:', root)
print('extension:', extension)

Output:

Root: /Users/user/Documents/sampledoc
Extension: .docx

The extension has now been successfully returned ted from the root file path.

Use the pathlib Module to Extract Extension From File in Python

pathlib is a Python module that contains classes representing file paths and implements utility functions and constants for these classes.

pathlib.Path() accepts a path string as an argument and returns a new Path object.

pathlib.Path object has the attribute suffix that returns the file extension information.

import pathlib

path = pathlib.Path('/Users/user/Documents/sampledoc.docx')

print('Parent:', path.parent)
print('Filename:', path.name)
print('Extension:', path.suffix)

Other than the root, we can also get the parent file path and the actual file name of the given file path by simply calling the attributes parent and name within the Path object.

Output:

Parent: /Users/user/Documents
Filename: sampledoc.docx
Extension: .docx

What if we have a file extension like .tar.gz or .tar.bz2?

pathlib also provides an attribute for files with multiple suffixes as extensions. The attribute suffixes within the Path object is a list containing all of the suffixes of the given file. If we use the example above and print out the suffixes attribute:

import pathlib

path = pathlib.Path('/Users/user/Documents/sampledoc.docx')

print('Suffix(es):', path.suffixes)

Output:

Suffix(es): ['.docx']

So even if there is only one suffix, the output will result in a singleton list.

Now try an example with a .tar.gz extension. To convert the list into a single string, the join() function can be used on an empty string and accept the suffixes attribute as an argument.

import pathlib

path = pathlib.Path('/Users/user/Documents/app_sample.tar.gz')

print('Parent:', path.parent)
print('Filename:', path.name)
print('Extension:', ''.join(path.suffixes))

Output:

Parent: /Users/user/Documents
Filename: app_sample.tar.gz
Extension: .tar.gz

Now the actual extension is displayed instead of a list.

In summary, the two modules os and pathlib provide convenient methods to get the file extension from a file path in Python.

The os module has the function splitext to split the root and the filename from the file extension. pathlib creates a Path object and simply stores the extension within the attribute suffixes.

If you’re anticipating more than one extension in a file, it would be best to use pathlib as it provides easy support for multiple extensions using the attribute suffixes.

Related Article - Python File

  • Get All the Files of a Directory
  • Delete a File and Directory in Python
  • Append Text to a File in Python
  • Check if a File Exists in Python
  • Find Files With a Certain Extension Only in Python
  • Read Specific Lines From a File in Python