Python 中的稀疏矩阵

Aditya Raj 2023年1月30日 Python Python Matrix

什么是 Python 中的稀疏矩阵
在 Python 中如何创建稀疏矩阵
使用 Python 中的 SciPy 模块将正态矩阵转换为稀疏矩阵
Python 中的压缩稀疏列矩阵
Python 中坐标格式的稀疏矩阵
Python 中基于键的稀疏矩阵字典
结论

在 Python 中实现机器学习算法时，我们经常需要以需要较少内存的格式来表示输入数据。

通常，提供给机器学习算法的输入数据以矩阵形式表示。本文将讨论在 Python 中使用稀疏矩阵存储数据。

为此，我们将学习 Python 中稀疏矩阵的不同表示。我们还将看到如何使用 Python 的 scipy 模块中定义的函数将简单矩阵转换为稀疏表示。

什么是 Python 中的稀疏矩阵

稀疏矩阵是大多数元素为 0 的矩阵。意思是，矩阵只包含几个位置的数据。

稀疏矩阵的一个例子如下。

[[16, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 5], [0, 0, 0, 0]]

在这里，你可以看到矩阵中的大多数元素都是 0。

稀疏矩阵广泛用于自然语言处理和数据编码。如果矩阵中的大多数元素都是 0，则存储所有矩阵元素在存储方面变得昂贵。

之所以如此，是因为我们只有几个数据点，而大部分存储空间都被冗余零占用。

在 Python 中如何创建稀疏矩阵

为了避免任何给定矩阵中冗余零的内存使用，我们可以将正常矩阵转换为稀疏矩阵。

你可以将稀疏矩阵视为包含三个元素的列表。列表的内部列表存储给定输入矩阵的非零元素的行号、列号和值。这表示稀疏矩阵。

例如，考虑以下输入矩阵。

[[16, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 5], [0, 0, 0, 0]]

该矩阵仅在两个位置具有非零元素：(0,0) 和 (2,3)。

为了将此矩阵转换为稀疏矩阵，我们将创建一个表示稀疏矩阵的列表。该列表将包含包含非零元素的行号、列号和值的列表。

因此，我们在稀疏矩阵中有两个内部列表：[0,0,16] 和 [2,3,5]。最终的稀疏矩阵如下。

[[0, 0, 16], [2, 3, 5]]

这里，

内部列表的第一个元素表示输入矩阵的非零元素的行号。
内部列表的第二个元素表示输入矩阵的非零元素的列号。
最后，内部列表的第三个元素包含非零元素的实际值。

要从给定矩阵创建稀疏矩阵，我们将首先创建一个表示稀疏矩阵的列表 sparse_matrix。之后，我们将使用 for 循环遍历输入矩阵。

在遍历时，如果我们在矩阵中找到一个非零元素，我们将创建一个包含行号、列号和元素值的三元组的列表。之后，我们将使用 append() 方法将列表添加到 sparse_matrix。

执行 for 循环后，我们将在列表 sparse_matrix 中拥有稀疏矩阵。你可以在以下示例中观察到这一点。

import numpy as np

input_matrix = np.array([[16, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 5], [0, 0, 0, 0]])
print("The input matrix is:")
print(input_matrix)
sparse_matrix = []
rows, cols = input_matrix.shape
for i in range(rows):
    for j in range(cols):
        if input_matrix[i][j] != 0:
            triplet = [i, j, input_matrix[i][j]]
            sparse_matrix.append(triplet)
print("The sparse matrix is:")
print(sparse_matrix)

输出：

The input matrix is:
[[16  0  0  0]
 [ 0  0  0  0]
 [ 0  0  0  5]
 [ 0  0  0  0]]
The sparse matrix is:
[[0, 0, 16], [2, 3, 5]]

你可以观察到与输入矩阵相比，稀疏矩阵的元素非常少。

当输入矩阵为 1024x1024 或更大尺寸（如在现实世界的机器学习应用程序中）时，使用稀疏矩阵变得非常有用。与输入矩阵相比，稀疏矩阵的大小变得非常小。

请记住，如果矩阵中非零元素的数量大于矩阵中总元素的三分之一，则创建和使用稀疏矩阵会比使用原始矩阵更昂贵。如果矩阵有 n 个非零元素，则稀疏矩阵包含 3*n 个元素。

使用 Python 中的 SciPy 模块将正态矩阵转换为稀疏矩阵

我们还可以使用 scipy 模块将普通矩阵转换为稀疏矩阵。scipy 模块提供了多种方法将普通矩阵转换为稀疏矩阵。

让我们一一讨论所有方法。

Python 中的压缩稀疏行矩阵

压缩稀疏行 (CSR) 矩阵是我们可以在算术运算中使用的稀疏矩阵。

CSR 矩阵支持加法、减法、乘法、除法和幂矩阵计算。你可以使用 Python 的 scipy 模块中定义的 csr_matrix() 方法将普通矩阵转换为压缩的稀疏行矩阵。

如下所示，csr_matrix() 方法将一个普通矩阵作为输入，并返回一个稀疏矩阵。

import numpy as np
from scipy import sparse

input_matrix = np.array([[16, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 5], [0, 0, 0, 0]])
print("The input matrix is:")
print(input_matrix)
sparse_matrix = sparse.csr_matrix(input_matrix)
print("The sparse matrix is:")
print(sparse_matrix)

输出：

The input matrix is:
[[16  0  0  0]
 [ 0  0  0  0]
 [ 0  0  0  5]
 [ 0  0  0  0]]
The sparse matrix is:
  (0, 0)	16
  (2, 3)	5

压缩稀疏行矩阵有助于高效的行切片和快速的矩阵向量乘积。但是，CSR 矩阵中的列切片操作很慢。

Python 中的压缩稀疏列矩阵

我们可以在需要列切片的程序中使用压缩稀疏列 (CSC) 矩阵代替 CSR 矩阵。

你可以使用 scipy 模块中定义的 csc_matrix() 方法在 Python 中创建 CSC 矩阵。csc_matrix() 方法接受一个普通矩阵作为输入参数，并在下面返回一个稀疏矩阵。

import numpy as np
from scipy import sparse

input_matrix = np.array([[16, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 5], [0, 0, 0, 0]])
print("The input matrix is:")
print(input_matrix)
sparse_matrix = sparse.csc_matrix(input_matrix)
print("The sparse matrix is:")
print(sparse_matrix)

输出：

The input matrix is:
[[16  0  0  0]
 [ 0  0  0  0]
 [ 0  0  0  5]
 [ 0  0  0  0]]
The sparse matrix is:
  (0, 0)	16
  (2, 3)	5

与压缩稀疏行矩阵相比，压缩稀疏列矩阵有助于更快的列切片和慢速行切片。

Python 中坐标格式的稀疏矩阵

坐标格式是创建稀疏矩阵的更快方法。你可以使用 scipy 模块中定义的 coo_matrix() 方法以坐标格式创建稀疏矩阵。

coo_matrix() 接受一个普通矩阵作为输入参数，并以坐标格式返回一个稀疏矩阵，如下所示。

import numpy as np
from scipy import sparse

input_matrix = np.array([[16, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 5], [0, 0, 0, 0]])
print("The input matrix is:")
print(input_matrix)
sparse_matrix = sparse.coo_matrix(input_matrix)
print("The sparse matrix is:")
print(sparse_matrix)

输出：

The input matrix is:
[[16  0  0  0]
 [ 0  0  0  0]
 [ 0  0  0  5]
 [ 0  0  0  0]]
The sparse matrix is:
  (0, 0)	16
  (2, 3)	5

如果需要将法线矩阵转换为 CSR 或 CSC 矩阵，应先将法线矩阵转换为坐标格式的稀疏矩阵。之后，你可以将稀疏矩阵转换为所需的格式。

坐标格式的稀疏矩阵主要用于将矩阵从一种格式互连到另一种格式。它不支持算术运算或切片。

Python 中基于键的稀疏矩阵字典

基于键字典 (DOK) 的稀疏矩阵提供对矩阵中元素的 O(1) 访问。

此外，基于 DOK 的矩阵不包含重复值。你可以使用 scipy 模块中定义的 dok_sparse() 方法创建基于键的稀疏矩阵的字典。

如下所示，dok_sparse() 方法接受一个普通矩阵并返回一个稀疏矩阵。

import numpy as np
from scipy import sparse

input_matrix = np.array([[16, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 5], [0, 0, 0, 0]])
print("The input matrix is:")
print(input_matrix)
sparse_matrix = sparse.dok_matrix(input_matrix)
print("The sparse matrix is:")
print(sparse_matrix)

输出：

The input matrix is:
[[16  0  0  0]
 [ 0  0  0  0]
 [ 0  0  0  5]
 [ 0  0  0  0]]
The sparse matrix is:
  (0, 0)	16
  (2, 3)	5

结论

在本文中，我们讨论了稀疏矩阵及其在 Python 中的实现。我们还看到了在 Python 中将普通矩阵转换为稀疏矩阵的不同方法。

在创建稀疏矩阵时，你应该知道矩阵的预期用途。如果有很多列切片操作，你应该创建一个 CSC 矩阵。

对于行切片操作，你应该创建一个 CSR 矩阵。如果输入矩阵很大，应先将其转换为坐标格式的稀疏矩阵。之后，就可以得到想要的稀疏矩阵了。

Enjoying our tutorials? Subscribe to DelftStack on YouTube to support us in creating more high-quality video guides. Subscribe

作者： Aditya Raj

Aditya Raj is a highly skilled technical professional with a background in IT and business, holding an Integrated B.Tech (IT) and MBA (IT) from the Indian Institute of Information Technology Allahabad. With a solid foundation in data analytics, programming languages (C, Java, Python), and software environments, Aditya has excelled in various roles. He has significant experience as a Technical Content Writer for Python on multiple platforms and has interned in data analytics at Apollo Clinics. His projects demonstrate a keen interest in cutting-edge technology and problem-solving, showcasing his proficiency in areas like data mining and software development. Aditya's achievements include securing a top position in a project demonstration competition and gaining certifications in Python, SQL, and digital marketing fundamentals.

GitHub