How to Import Multiple CSV Files Into Pandas and Concatenate Into One DataFrame

Preet Sanghavi Feb 02, 2024
  1. What is Pandas
  2. How to Read Single .csv File Using Pandas
  3. Read Multiple CSV Files in Python
  4. Concatenate Multiple DataFrames in Python
How to Import Multiple CSV Files Into Pandas and Concatenate Into One DataFrame

This tutorial is about how to read multiple .csv files and concatenate all DataFrames into one.

This tutorial will use Pandas to read the data files and create and combine the DataFrames.

What is Pandas

This package comes with a wide array of functions to read a variety of data files as well as perform data manipulation techniques.

To install the pandas package on your machine, you must open the Command Prompt/Terminal and run pip install pandas.

How to Read Single .csv File Using Pandas

The pandas package provides a function to read a .csv file.

>>> import pandas as pd
>>> df = pd.read_csv(filepath_or_buffer)

Given the file path, the pandas function read_csv() will read the data file and return the object.

>>> type(df)
<class 'pandas.core.frame.DataFrame'>

Read Multiple CSV Files in Python

There’s no explicit function to perform this task using only the pandas module. However, we can devise a rational method for performing the following.

Firstly, we need to have the path of all the data files. It will be easy if all the files are situated in one particular folder.

Creating a list where all the files’ paths and names will be stored.

>>> import pandas as pd
>>> import glob
>>> import os
>>> # This is a raw string containing the path of files
>>> path = r'D:\csv files'
>>> all_files = glob.glob(os.path.join(path, '*.csv'))
>>> all_files
['D:\\csv files\\FILE_1.csv', 'D:\\csv files\\FILE_2.csv']

In the above code, a list is created containing the file path.

glob Module

Use the glob module to find files or pathnames matching a pattern. The glob follows Standard Unix path expansion rules to match patterns.

There’s no need to install this module externally because it is already included with Python. However, if you do not have this package, type pip install glob2, and you should be good to go.

To retrieve paths recursively from within directories/files and subdirectories/subfiles, we can utilize the glob module’s functions glob.glob() and glob.iglob().

Syntax:

glob.glob(pathname, *, recursive=False)
glob.iglob(pathname, *, recursive=False)

The function will return a list containing the paths of all the files.

For example, to retrieve all file names from a given path, use the asterisk symbol * at the end of the path, passing it as a string to the glob.glob('') function.

>>> for files in glob.glob(r'D:\csv files\*'):
	    print(files)

D:\csv files\FILE_1.csv
D:\csv files\FILE_2.csv
D:\csv files\textFile1.txt
D:\csv files\textFile2.txt

Moreover, specify the file extension after the asterisk symbol to perform a more focused search.

>>> for files in glob.glob(r'D:\csv files\*.csv'):
	    print(files)

D:\csv files\FILE_1.csv
D:\csv files\FILE_2.csv

What are Raw Strings

In Python, a raw string is formed by adding r or R to a literal string. The backslash (\) is a literal character in Python raw string.

This is useful when we want a string with a backslash but don’t want it to be considered an escape character.

For instance:

To represent special characters such as tabs and newlines, we use the backslash (\) to signify the start of an escape sequence.

>>> print("This\tis\nnormal\tstring")
This	is
normal	string

However, raw strings treat the backslash (\) as a literal character. For example:

>>> print(r"This\tis\nnormal\tstring")
This\tis\nnormal\tstring

os Module

Python’s os module contains methods for dealing with the operating system. os is included in the basic utility modules for Python.

This module offers a portable method of using functionality dependent on the operating system. Python’s os.path module, a sub-module of the os module, is used to manipulate common pathnames.

Python’s os.path.join() function intelligently joins one or more path components. Except for the last path component, this approach concatenates different path components by placing exactly one directory separator ("/") after each non-empty portion.

A directory separator ("/") is added at the end of the final path component to be linked is empty.

All previously connected components are deleted if a path component represents an absolute path and joining moves on to the component representing the absolute path.

Syntax:

os.path.join(path, *path)

To merge different path components, use the os.path.join() function.

import os

path = "Users"
os.path.join(path, "Desktop", "data.csv")

Output:

"Users\\Desktop\\data.csv"

Concatenate Multiple DataFrames in Python

Moving further, use the paths returned from the glob.glob() function to pull data and create dataframes. Subsequently, we will also append the Pandas dataframe objects to the list.

Code:

dataframes = list()

for dfs in all_files:
    data = pd.read_csv(dfs)
    dataframes.append(data)

A list of dataframes is created.

>>> dataframes
[dataframe1, dataframe2]

Concatenating the dataframes.

Note: Before concatenating the dataframes, all the dataframe must have similar columns.

pd.concat(dataframes, ignore_index=True)

The pandas.concat() method handles all the intensive concatenation operations together with a Pandas object axis, with set logic operations (union or intersection) of the indexes on the other axis as an optional extra.

Full code:

# importing the required modules
import pandas as pd
import os
import glob

# Path of the files
path = r"D:\csv files"

# joining the path and creating list of paths
all_files = glob.glob(os.path.join(path, "*.csv"))

dataframes = list()

# reading the data and appending the dataframe
for dfs in all_files:
    data = pd.read_csv(dfs)
    dataframes.append(data)

# Concatenating the dataframes
df = pd.concat(dataframes, ignore_index=True)
Preet Sanghavi avatar Preet Sanghavi avatar

Preet writes his thoughts about programming in a simplified manner to help others learn better. With thorough research, his articles offer descriptive and easy to understand solutions.

LinkedIn GitHub

Related Article - Python CSV