Python Split CSV Into Multiple Files

Python Split CSV Into Multiple Files

  1. Create a CSV File in Python Using Pandas
  2. Split a CSV File Into Multiple Files in Python
  3. Conclusion

In this article, we will learn how to split a CSV file into multiple files in Python. We will use Pandas to create a CSV file and split it into multiple other files.

Create a CSV File in Python Using Pandas

To create a CSV in Python using Pandas, it is mandatory to first install Pandas through Command Line Interface (CLI).

pip install pandas

This command will download and install Pandas into your local machine. Using the import keyword, you can easily import it into your current Python program.

Let’s verify Pandas if it is installed or not.

Code Example:

import pandas as pd
print("The Version of Pandas is: ", pd.__version__)

Output:

The Version of Pandas is: 1.3.5

Now, let’s create a CSV file.

Code example:

import pandas as pd

# create a data set
data_dict = {'Roll no':    [1, 2, 3, 4, 5, 6, 7, 8],

             'Gender': ["Male", "Female", "Female", "Male",
                        "Male", "Female", "Male", "Female"],

             'CGPA':       [3.5, 3.3, 2.7, 3.8, 2.4, 2.1, 2.9, 3.9],

             'English':     [76, 77, 85, 91, 49, 86, 66, 98],

             'Mathematics': [78, 87, 54, 65, 90, 59, 63, 89],

             'Programming': [99, 45, 68, 85, 60, 39, 55, 88]}

# create a data frame
data = pd.DataFrame(data_dict)

# convert the data frame into a csv file
data.to_csv("studesnts.csv")

# Print the output
print(data)

Output:

   Roll no  Gender  CGPA  English  Mathematics  Programming
0        1    Male   3.5       76           78           99
1        2  Female   3.3       77           87           45
2        3  Female   2.7       85           54           68
3        4    Male   3.8       91           65           85
4        5    Male   2.4       49           90           60
5        6  Female   2.1       86           59           39
6        7    Male   2.9       66           63           55
7        8  Female   3.9       98           89           88

Split a CSV File Into Multiple Files in Python

We have successfully created a CSV file. Let’s split it into multiple files, but different matrices could be used to split a CSV on the bases of columns or rows.

Split a CSV File Based on Rows

Let’s split a CSV file on the bases of rows in Python.

Code Example:

import pandas as pd

# read DataFrame
data = pd.read_csv("students.csv")

# number of csv files along with the row
k = 2
size = 4

for i in range(k):
    df = data[size*i:size*(i+1)]

    df.to_csv(f'students{i+1}.csv', index=False)

file1 = pd.read_csv("students1.csv")
print(file1)
print("\n")
file2 = pd.read_csv("students2.csv")
print(file2)

Output:

      Roll no  Gender  CGPA  English  Mathematics  Programming
0        1    Male     3.5       76           78           99
1        2    Female   3.3       77           87           45
2        3    Female   2.7       85           54           68
3        4    Male     3.8       91           65           85

      Roll no  Gender  CGPA  English  Mathematics  Programming
4        5   Male     2.4       49           90           60
5        6   Female   2.1       86           59           39
6        7   Male     2.9       66           63           55
7        8   Female   3.9       98           89           88

The above code has split the students.csv file into two multiple files, student1.csv and student2.csv. The file is separated row-wise; rows 0 to 3 are stored in student.csv, and rows 4 to 7 are stored in the student2.csv file.

Split a CSV File Based on Columns

We can split any CSV file based on column matrices with the help of the groupby() function. The groupby() function belongs to the Pandas library and uses group data.

In this case, we are grouping the students data based on Gender.

Code example:

import pandas as pd

# read DataFrame
data = pd.read_csv("students.csv")

for (gender), group in data.groupby(['Gender']):
     group.to_csv(f'{gender} students.csv', index=False)

print(pd.read_csv("Male students.csv"))
print("\n")
print(pd.read_csv("Female students.csv"))

Output:

       Roll no  Gender  CGPA  English  Mathematics  Programming
0        1      Male   3.5       76           78           99
1        4      Male   3.8       91           65           85
2        5      Male   2.4       49           90           60
3        7      Male   2.9       66           63           55


       Roll no  Gender  CGPA  English  Mathematics  Programming
0        2      Female   3.3       77           87           45
1        3      Female   2.7       85           54           68
2        6      Female   2.1       86           59           39
3        8      Female   3.9       98           89           88

Conclusion

Splitting data is a useful data analysis technique that helps understand and efficiently sort the data.

In this article, we’ve discussed how to create a CSV file using the Pandas library. In addition, we have discussed the two common data splitting techniques, row-wise and column-wise data splitting.

Zeeshan Afridi avatar Zeeshan Afridi avatar

Zeeshan is a detail oriented software engineer that helps companies and individuals make their lives and easier with software solutions.

LinkedIn

Related Article - Python CSV

  • Compare Two CSV Files and Print Differences Using Python
  • Convert XLSX to CSV File in Python
  • Write List to CSV Columns in Python
  • Python Write to CSV Line by Line
  • Read CSV Line by Line in Python