How to List All Files in Directory and Subdirectories in Python

Fariba Laiq Feb 02, 2024
  1. Use the os.listdir() Function to List All Files in the Directory and Subdirectories in Python
  2. Use the os.scandir() Function to List All Files in the Directory and Subdirectories in Python
  3. Use the os.walk() Function to List All Files in the Directory and Subdirectories in Python
  4. Use the glob.glob() Function to List All Files in the Directory and Subdirectories in Python
  5. Use the pathlib.path() Function to List All Files in the Directory and Subdirectories in Python
  6. Conclusion
How to List All Files in Directory and Subdirectories in Python

In programming, managing and manipulating files is a fundamental task. Whether we’re organizing our files, analyzing data, or building applications that require file processing, efficiently listing all files in a directory and its subdirectories is a crucial skill.

With its simplicity and versatility, Python provides an array of tools and libraries to tackle this challenge.

The problem is traversing through a directory structure and gathering a comprehensive list of all its files, including those in its subdirectories. This may seem like a straightforward task, but as the depth and complexity of the directory structure increase, it becomes increasingly cumbersome to locate and enumerate each file manually.

Imagine we are data scientists working on a project that involves analyzing a massive dataset spread across multiple folders and subfolders. Instead of manually searching for all the relevant files and writing custom code for each folder level, we can utilize Python’s file listing capabilities to efficiently traverse through the entire directory structure, compiling a complete list of files for further analysis or processing.

In this tutorial, we will explore various approaches using Python to list all files in a directory and its subdirectories.

Use the os.listdir() Function to List All Files in the Directory and Subdirectories in Python

The os module in Python allows interaction with the Operating System. It has many built-in functions that deal with the file system.

The os.listdir() function in Python provides a straightforward way to list all files in a directory, including its subdirectories. It returns a list of all the entries (files and directories) in the specified directory.

Syntax:

for file in os.listdir(directory_path):
    # Code to process files

When using os.listdir(), we pass the directory_path parameter, representing the directory path from which we want to list files. It can be either a relative or an absolute path.

The function returns a list of entries within the specified directory. However, this list includes files and directories, so we must distinguish between them during processing.

Our Directory Structure

The following image shows our current directory structure used throughout this tutorial.

Directory stucture

The following code uses the os.listdir() function to list all files in a directory and its subdirectories in Python.

import os


def list_files(directory_path):
    files = []
    for file_name in os.listdir(directory_path):
        file_path = os.path.join(directory_path, file_name)
        if os.path.isfile(file_path):
            files.append(file_path)
        elif os.path.isdir(file_path):
            files.extend(list_files(file_path))
    return files


# Usage
directory_path = "MyFolder"
all_files = list_files(directory_path)
for file_path in all_files:
    print(file_path)

Output:

using os.listdir method

In the above code, the function list_files is defined to list all files inside a directory and its subdirectories using os.listdir(). Within the function, it initializes an empty list, files, to store the file paths. It iterates over the file_name obtained from os.listdir(directory_path).

For each file_name, it constructs the file_path by joining it with the directory_path using os.path.join().

If file_path represents a file (os.path.isfile(file_path)), the file path is appended to the files list. If file_path represents a directory (os.path.isdir(file_path)), the function is called recursively to get the files within that subdirectory, and the resulting files are added to the files list.

After defining the function, we can specify the directory_path and call the list_files function. It will return a list of all file paths within the directory and its subdirectories.

As shown in the example, we can then iterate over this list to perform any desired operations, such as printing the file paths.

Use the os.scandir() Function to List All Files in the Directory and Subdirectories in Python

The os.scandir() function in Python provides a powerful way to iterate over the entries (files and directories) within a directory, including its subdirectories. It returns an iterator that yields DirEntry objects representing each entry.

Syntax:

for entry in os.scandir(directory_path):
    # Code to process files

When using os.scandir(), we provide the directory_path parameter, which is the path of the directory we want to list files from.

The function returns an iterator that allows us to iterate over each entry. Each DirEntry object has methods like is_file() and is_dir() to determine if it is a file or a directory.

Compared with the previously discussed os.listdir() method, os.scandir() provides more functionality by returning DirEntry objects. These objects have additional methods and attributes that can retrieve metadata or perform operations on the entries.

This makes it more versatile when dealing with complex file operations. In the following code, we have displayed the files in the directory and the subdirectories of MyFolder using the os.scandir() method in Python.

import os


def list_files(directory_path):
    files = []
    for entry in os.scandir(directory_path):
        if entry.is_file():
            files.append(entry.path)
        elif entry.is_dir():
            files.extend(list_files(entry.path))
    return files


# Usage
directory_path = "MyFolder"
all_files = list_files(directory_path)
print("\n".join(all_files))

Output:

using os.scandir method

The above code defines a function list_files that takes a directory_path as input and returns a list of all files inside the directory and its subdirectories. Within the function, it initializes an empty list, files, to store the file paths.

It uses os.scandir() to iterate over the entries (files and directories) in the specified directory. For each entry, if it is a file, the path is appended to the files list.

If it is a directory, the function is called recursively to get the files within that subdirectory, and the resulting files are added to the files list.

After defining the function, we can specify the directory_path and call the list_files function. It will return a list of all file paths within the directory and its subdirectories.

As shown in the example, we can then iterate over this list to perform any desired operations, such as printing the file paths.

Use the os.walk() Function to List All Files in the Directory and Subdirectories in Python

Using this module, we can fetch, create, remove, and change the directories. The os.walk() is a recursive method that generates the file names in a directory tree by either walking in a top-down or bottom-up manner.

Syntax:

for root, dirs, files in os.walk(directory_path):
    # Code to process files

When using os.walk(), we provide the directory_path parameter, representing the path of the directory we want to traverse. This can be either a relative or absolute path.

The function returns a generator that yields a tuple at each iteration. This tuple contains three values: root, dirs, and files.

  1. The root represents the current directory being traversed. It can be useful for constructing the full path of the files or performing specific operations based on the directory.
  2. The dirs is a list of directories in the current directory. It allows us to access and manipulate the subdirectories if needed.
  3. The files is a list of files in the current directory. This is where we can access and process each file individually.

The os.walk() method provides a generator-based approach to traverse directories, providing each level access to root, directories, and files. On the other hand, os.listdir() returns a list of entries directly, requiring explicit handling of files and directories within the processing code, and os.scandir() returns DirEntry objects with additional functionality.

In the following code, we have displayed the files in the directory and the subdirectories of MyFolder using the os.walk() method in Python.

import os

root = "MyFolder"
for path, subdirs, files in os.walk(root):
    for name in files:
        print(os.path.join(path, name))

Output:

using os.walk method

The provided code uses the os.walk() function to traverse the directory tree starting from the MyFolder directory. It iterates over the root, subdirectories, and files at each level.

It prints the full path for each file encountered by joining the current path with the file name using os.path.join().

Use the glob.glob() Function to List All Files in the Directory and Subdirectories in Python

The glob is a built-in module in Python that stands for global. This module returns all file paths whose name and extension match a specific pattern.

Syntax:

import glob

file_paths = glob.glob(directory_path + "/**/*", recursive=True)

When using glob.glob() function, we provide the directory_path parameter along with the pattern '/**/*'. The recursive=True argument ensures that subdirectories are included in the search.

The function returns a list of file paths that match the specified pattern within the directory and its subdirectories.

Compared with the previous methods, the glob.glob() function provides a more concise way to list files using pattern-matching rules. By utilizing glob.glob(), we can efficiently list all files in a directory and its subdirectories in Python.

The pattern ** will match any files and zero or more folders and subdirectories if recursive is set to True. In the following code, we have displayed the files in the directory and the subdirectories of MyFolder using the glob.glob() method in Python.

import glob

path = "MyFolder\**\*.*"
for file in glob.glob(path, recursive=True):
    print(file)

Output:

using glob.glob method

The above code utilizes the glob.glob() function to list all files in a directory and its subdirectories in Python based on a specified pattern.

The path variable represents the pattern used for matching files. The pattern 'MyFolder\**\*.*' specifies the starting directory as MyFolder and uses the ** wildcard to indicate all subdirectories. The *.* wildcard matches any file name with any extension.

The code uses a for loop to iterate over the file paths returned by glob.glob(). For each matching file, it prints the file path.

Use the pathlib.path() Function to List All Files in the Directory and Subdirectories in Python

The pathlib.Path() function in Python, part of the pathlib module, provides an object-oriented approach for working with file paths. It can list all files in a directory and its subdirectories.

Syntax:

import pathlib

path = pathlib.Path(directory_path)
for file in path.glob('**/*'):
    # Code to process files

When using pathlib.Path(), we pass the directory_path parameter to create a Path object representing the specified directory.

We then utilize the glob() method on the Path object with the pattern '**/*' to traverse the directory and subdirectories recursively. This method returns a generator that yields matching file paths.

Compared with the previously discussed methods, the pathlib.path() function provides a more object-oriented and intuitive approach to file path manipulation.

By using pathlib.Path() and its glob() method, we can efficiently list all files in a directory and its subdirectories in Python while benefiting from the object-oriented features of the pathlib module.

In the following code, we have displayed the files in the directory and the subdirectories of MyFolder using the pathlib.path() method in Python.

import pathlib


def list_files(directory_path):
    path = pathlib.Path(directory_path)
    for file in path.glob("**/*"):
        if file.is_file():
            print(file)


# Usage
directory_path = "MyFolder"
list_files(directory_path)

Output:

using pathlib.path method

In the above code, we define the list_files function that takes a directory_path as input. We create a pathlib.Path object using the provided directory path.

We then use the glob method with the pattern '**/*' to traverse the directory and its subdirectories recursively. For each item returned by the glob method, we check if it is a file using the is_file() method.

If it is a file, we print its path. We can specify the directory_path variable to the desired directory, and the code will print all the files in that directory and its subdirectories.

Conclusion

We have explored various methods for listing all files in a directory and its subdirectories in Python.

  1. os.listdir() provides a basic approach but lacks recursive functionality.
  2. os.scandir() offers enhanced functionality with DirEntry objects but requires additional handling.
  3. os.walk() is a generator-based approach that yields tuples, providing comprehensive directory traversal capabilities.
  4. glob.glob() allows pattern matching for a more specific file selection.
  5. pathlib.Path() offers an object-oriented approach with a versatile glob() method and intuitive file path manipulation.

Each method has advantages and disadvantages, providing different levels of functionality and convenience. Choosing the most suitable method depends on the specific requirements of the task at hand.

Author: Fariba Laiq
Fariba Laiq avatar Fariba Laiq avatar

I am Fariba Laiq from Pakistan. An android app developer, technical content writer, and coding instructor. Writing has always been one of my passions. I love to learn, implement and convey my knowledge to others.

LinkedIn

Related Article - Python Directory