How to Read CSV to Array in Python

Lakshay Kapoor Feb 12, 2024
  1. Use numpy.loadtxt() to Read a CSV File Into an Array in Python
  2. Use the list() Method to Read a CSV File Into an Array in Python
  3. Use the pd.read_csv Method to Read a CSV File Into an Array in Python
  4. Use the np.genfromtxt Method to Read a CSV File Into an Array in Python
  5. Conclusion
How to Read CSV to Array in Python

The use of CSV files is widespread in the field of data analysis/data science in Python. CSV stands for Comma Separated Values. These types of files are used to store data in the form of tables and records.

In these tables, there are a lot of columns separated by commas. One of the tasks in manipulating these CSV files is importing these files in the form of data arrays.

This tutorial will introduce different methods to import CSV files in the form of data arrays.

Use numpy.loadtxt() to Read a CSV File Into an Array in Python

As the name suggests, the open() function is used to open the CSV file. NumPy’s loadtxt() function helps in loading the data from a text file.

In this function’s arguments, there are two parameters that must be mentioned: file name or the variable in which the file name is stored, and the other one is called delimiter, which denotes the string used for separating the values.

The default value of the delimiter is whitespace.

Example:

with open("example.csv", "w") as file:
    file.write("1,2,3\n4,5,6\n7,8,9")

import numpy as np

# Reading the CSV file into an array
data_array = np.loadtxt("example.csv", delimiter=",")

# Displaying the result
print(data_array)

In this example, we begin by creating a CSV file named example.csv with three rows and three columns of numbers. We employ a straightforward file write operation for this task.

Next, we import NumPy and utilize np.loadtxt() to read the contents of example.csv. We specify the delimiter as , since our data is comma-separated.

The function reads the data and transforms it into a 2D NumPy array. We then employ the print() function to showcase the contents of the array.

python read csv into array - output 1

Use the list() Method to Read a CSV File Into an Array in Python

Here, we use the csv module of Python, which is used to read that CSV file in the same tabular format. More precisely, the reader() method of this module is used to read the CSV file.

Finally, the list() method takes all the sequences and the values in tabular format and converts them into a list.

Example:

with open("example.csv", "w") as file:
    file.write("Name,Age,Occupation\nJohn,28,Engineer\nJane,34,Doctor")
import csv

# Reading the CSV file into an array
with open("example.csv", "r") as file:
    csv_reader = csv.reader(file)
    data_array = list(csv_reader)

# Displaying the result
print(data_array)

We start by creating a CSV file named example.csv with a header and two data rows.

We then read this file using csv.reader. When we open the file with open('example.csv', 'r'), we are creating a file object that csv.reader can iterate over.

csv.reader reads each line in the file and returns a list of strings representing the fields in that row.

We then convert this reader object into a list using list(csv_reader). This operation effectively loads all rows from the CSV file into a list of lists, where each inner list is a row in the CSV.

Finally, we use print() to display the contents of the array.

python read csv into array - output 2

Use the pd.read_csv Method to Read a CSV File Into an Array in Python

Pandas offer extensive functionality for reading, processing, and writing data in various formats, including CSV (Comma-Separated Values). The pandas.read_csv() function is a versatile and powerful tool for reading CSV files into Pandas DataFrames, which can then be easily converted to arrays.

Example:

with open("example.csv", "w") as file:
    file.write("Name,Age,Occupation\nJohn,28,Engineer\nJane,34,Doctor")

import pandas as pd

# Reading the CSV file into a DataFrame
df = pd.read_csv("example.csv")

# Converting the DataFrame to a numpy array
data_array = df.values

# Displaying the result
print(data_array)

In the provided code, we initiate by creating a straightforward CSV file named example.csv using standard file I/O operations in Python.

Subsequently, we leverage Pandas to read this CSV file.

The pd.read_csv('example.csv') function reads the CSV file into a DataFrame. This DataFrame df constitutes a 2D labeled data structure with columns potentially of different types.

To transform this DataFrame into a NumPy array, we utilize the .values attribute. This presents a simple and efficient method for converting the data into an array format, which can then be employed for further numerical computations or processing.

Finally, we employ the print() function to showcase the contents of the NumPy array.

python read csv into array - output 3

Use the np.genfromtxt Method to Read a CSV File Into an Array in Python

A fundamental package for numerical computations in Python provides the np.genfromtxt function. It’s designed to handle CSV (Comma-Separated Values) and other delimited text files, especially when dealing with missing or heterogeneous data.

Example:

with open("example.csv", "w") as file:
    file.write("1,2,3\n4,,6\n7,8,")

import numpy as np

# Reading the CSV file into an array, handling missing values
data_array = np.genfromtxt("example.csv", delimiter=",", filling_values=np.nan)

# Displaying the result
print(data_array)

In this example, we create a CSV file named example.csv, intentionally including missing values represented by empty fields.

We proceed by importing NumPy and employing np.genfromtxt to read the file. We use the delimiter=',' parameter to indicate that the fields are comma-separated.

To effectively manage missing values, we specify filling_values=np.nan, which substitutes missing entries with NaN (Not a Number).

This function adeptly reads the data and returns a 2D NumPy array, seamlessly accommodating for the missing values.

Finally, we use the print() function to display the array.

python read csv into array - output 4

Conclusion

In conclusion, this article has comprehensively demonstrated various methods for importing CSV files into data arrays in Python, a fundamental skill in data analysis and data science. We explored techniques ranging from using NumPy’s loadtxt() and genfromtxt() functions, which are ideal for numerical and mixed-type data with the capability to handle missing values, to employing Python’s built-in csv module for more general purposes.

Additionally, we showcased how Pandas’ read_csv() function provides a powerful and flexible approach for reading CSV files into DataFrames, which can be easily converted into NumPy arrays for further analysis. Each method has its unique advantages and is suitable for different scenarios, giving Python programmers versatile tools to effectively handle CSV data in various formats and structures.

Lakshay Kapoor avatar Lakshay Kapoor avatar

Lakshay Kapoor is a final year B.Tech Computer Science student at Amity University Noida. He is familiar with programming languages and their real-world applications (Python/R/C++). Deeply interested in the area of Data Sciences and Machine Learning.

LinkedIn

Related Article - Python CSV

Related Article - Python Array