How to Sparse Matrix in Python

Aditya Raj Feb 02, 2024
  1. What Is a Sparse Matrix in Python
  2. How to Create Sparse Matrices in Python
  3. Convert Normal Matrix to Sparse Matrix Using the SciPy Module in Python
  4. Compressed Sparse Column Matrix in Python
  5. Sparse Matrix in Coordinate Format in Python
  6. Dictionary of Keys Based Sparse Matrix in Python
  7. Conclusion
How to Sparse Matrix in Python

While implementing machine learning algorithms in Python, we often need to represent the input data in a format that requires less memory.

Normally, the input data given to the machine learning algorithms are represented in matrix form. This article will discuss using the sparse matrix to store data in Python.

For this, we will learn different representations of the sparse matrix in Python. We will also see how to convert a simple matrix to sparse representation using the functions defined in the scipy module in Python.

What Is a Sparse Matrix in Python

A sparse matrix is a matrix whose most elements are 0. Meaning, the matrix contains data only at a few locations.

An example of the sparse matrix is as follows.

[[16, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 5], [0, 0, 0, 0]]

Here, you can see that most of the elements in the matrix are 0.

Sparse matrices are extensively used in natural language processing and data encoding. If most of the elements in the matrix are 0, storing all the matrix elements becomes costly in terms of storage.

This is so because we have only a few data points, and most of the storage is occupied by redundant zeros.

How to Create Sparse Matrices in Python

To avoid the memory usage for the redundant zeros in any given matrix, we can convert a normal matrix to a sparse matrix.

You can think of a sparse matrix as a list containing three elements. The inner list of the list stores the row number, column number, and value of the non-zero elements of the given input matrix. This represents the sparse matrix.

For instance, consider the following input matrix.

[[16, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 5], [0, 0, 0, 0]]

This matrix has non-zero elements at only two locations: (0,0) and (2,3).

To convert this matrix to a sparse matrix, we will create a list representing the sparse matrix. The list will contain lists containing the row number, column number, and value of the non-zero elements.

So, we have two inner lists in the sparse matrix: [0,0,16] and [2,3,5]. The final sparse matrix will be as follows.

[[0, 0, 16], [2, 3, 5]]

Here,

  • The first element of the inner lists represents the row number of the non-zero elements of the input matrix.
  • The second element of the inner lists represents the column number of the non-zero elements of the input matrix.
  • Finally, the third element of the inner list contains the actual value of the non-zero elements.

To create the sparse matrix from a given matrix, we will first create a list sparse_matrix representing the sparse matrix. After that, we will traverse through the input matrix using a for loop.

While traversing, if we find a non-zero element in the matrix, we will create a list containing the triplet of row number, column number, and the element value. After that, we will add the list to sparse_matrix using the append() method.

After executing the for loop, we will have the sparse matrix in the list sparse_matrix. You can observe this in the following example.

import numpy as np

input_matrix = np.array([[16, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 5], [0, 0, 0, 0]])
print("The input matrix is:")
print(input_matrix)
sparse_matrix = []
rows, cols = input_matrix.shape
for i in range(rows):
    for j in range(cols):
        if input_matrix[i][j] != 0:
            triplet = [i, j, input_matrix[i][j]]
            sparse_matrix.append(triplet)
print("The sparse matrix is:")
print(sparse_matrix)

Output:

The input matrix is:
[[16  0  0  0]
 [ 0  0  0  0]
 [ 0  0  0  5]
 [ 0  0  0  0]]
The sparse matrix is:
[[0, 0, 16], [2, 3, 5]]

You can observe that the sparse matrix has very few elements compared to the input matrix.

Using the sparse matrix becomes very useful when the input matrix is of 1024x1024 or more size as in real-world machine learning applications. The sparse matrix’s size becomes significantly low compared to the input matrix.

Remember that if the number of non-zero elements in a matrix is greater than one-third of the total elements in the matrix, creating and using a sparse matrix becomes more costly than using the original matrix. If a matrix has n non-zero elements, the sparse matrix contains 3*n elements.

Convert Normal Matrix to Sparse Matrix Using the SciPy Module in Python

We can also convert a normal matrix into a sparse matrix using the scipy module. The scipy module provides various methods to convert a normal matrix to a sparse matrix.

Let us discuss all of the methods one by one.

Compressed Sparse Row Matrix in Python

Compressed sparse row (CSR) matrices are sparse matrices that we can use in arithmetic operations.

CSR matrices support addition, subtraction, multiplication, division, and power matrix calculation. You can convert a normal matrix to a compressed sparse row matrix using the csr_matrix() method defined in Python’s scipy module.

As shown below, the csr_matrix() method takes a normal matrix as input and returns a sparse matrix.

import numpy as np
from scipy import sparse

input_matrix = np.array([[16, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 5], [0, 0, 0, 0]])
print("The input matrix is:")
print(input_matrix)
sparse_matrix = sparse.csr_matrix(input_matrix)
print("The sparse matrix is:")
print(sparse_matrix)

Output:

The input matrix is:
[[16  0  0  0]
 [ 0  0  0  0]
 [ 0  0  0  5]
 [ 0  0  0  0]]
The sparse matrix is:
  (0, 0)	16
  (2, 3)	5

Compressed Sparse row matrices facilitate efficient row slicing and fast matrix-vector products. However, column slicing operations are slow in CSR matrices.

Compressed Sparse Column Matrix in Python

We can use a compressed sparse column (CSC) matrix instead of the CSR matrices in the programs needing column slicing.

You can create a CSC matrix in Python using the csc_matrix() method defined in the scipy module. The csc_matrix() method accepts a normal matrix as an input argument and returns a sparse matrix below.

import numpy as np
from scipy import sparse

input_matrix = np.array([[16, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 5], [0, 0, 0, 0]])
print("The input matrix is:")
print(input_matrix)
sparse_matrix = sparse.csc_matrix(input_matrix)
print("The sparse matrix is:")
print(sparse_matrix)

Output:

The input matrix is:
[[16  0  0  0]
 [ 0  0  0  0]
 [ 0  0  0  5]
 [ 0  0  0  0]]
The sparse matrix is:
  (0, 0)	16
  (2, 3)	5

Compressed sparse column matrices facilitate faster column slicing and slow row slicing as compared to compressed sparse row matrices.

Sparse Matrix in Coordinate Format in Python

The coordinate format is a faster way to create sparse matrices. You can create a sparse matrix in the coordinate format using the coo_matrix() method defined in the scipy module.

The coo_matrix() accepts a normal matrix as an input argument and returns a sparse matrix in the coordinate format, as shown below.

import numpy as np
from scipy import sparse

input_matrix = np.array([[16, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 5], [0, 0, 0, 0]])
print("The input matrix is:")
print(input_matrix)
sparse_matrix = sparse.coo_matrix(input_matrix)
print("The sparse matrix is:")
print(sparse_matrix)

Output:

The input matrix is:
[[16  0  0  0]
 [ 0  0  0  0]
 [ 0  0  0  5]
 [ 0  0  0  0]]
The sparse matrix is:
  (0, 0)	16
  (2, 3)	5

If you need to convert a normal matrix to CSR or CSC matrix, you should first convert the normal matrix to a sparse matrix in coordinate format. After that, you can convert the sparse matrix into the desired format.

A sparse matrix in coordinate format is mostly used to interconnect matrices from one format to another. It doesn’t support arithmetic operations or slicing.

Dictionary of Keys Based Sparse Matrix in Python

Dictionary of keys (DOK) based sparse matrix provides O(1) access to the elements in the matrix.

Also, DOK-based matrices do not contain duplicate values. You can create a dictionary of the keys-based sparse matrix using the dok_sparse() method defined in the scipy module.

As shown below, the dok_sparse() method takes a normal matrix and returns a sparse matrix.

import numpy as np
from scipy import sparse

input_matrix = np.array([[16, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 5], [0, 0, 0, 0]])
print("The input matrix is:")
print(input_matrix)
sparse_matrix = sparse.dok_matrix(input_matrix)
print("The sparse matrix is:")
print(sparse_matrix)

Output:

The input matrix is:
[[16  0  0  0]
 [ 0  0  0  0]
 [ 0  0  0  5]
 [ 0  0  0  0]]
The sparse matrix is:
  (0, 0)	16
  (2, 3)	5

Conclusion

In this article, we have discussed sparse matrices and their implementation in Python. We also saw different ways to convert a normal matrix into a sparse matrix in Python.

While creating a sparse matrix, you should know the intended use of the matrix. If there are many column slicing operations, you should create a CSC matrix.

For row slicing operations, you should create a CSR matrix. If the input matrix is large, you should first convert it to Sparse Matrix in coordinate format. After that, you can obtain the desired sparse matrix.

Author: Aditya Raj
Aditya Raj avatar Aditya Raj avatar

Aditya Raj is a highly skilled technical professional with a background in IT and business, holding an Integrated B.Tech (IT) and MBA (IT) from the Indian Institute of Information Technology Allahabad. With a solid foundation in data analytics, programming languages (C, Java, Python), and software environments, Aditya has excelled in various roles. He has significant experience as a Technical Content Writer for Python on multiple platforms and has interned in data analytics at Apollo Clinics. His projects demonstrate a keen interest in cutting-edge technology and problem-solving, showcasing his proficiency in areas like data mining and software development. Aditya's achievements include securing a top position in a project demonstration competition and gaining certifications in Python, SQL, and digital marketing fundamentals.

GitHub

Related Article - Python Matrix