- Multiplication of Matrices
- Check if DataFrames Are Aligned in Pandas
dotFunction to Carry Out Matrix Multiplication in Pandas
Matrix multiplication is used widely for understanding networks relation, coordinate system transformation, number modeling, and inventory calculations, among other things. With
row-column based numerical data, we can perform matrix multiplication and uses the result in whatever applicable areas.
Pandas and Numpy have tools and functions that enable matrix usage and operations such as multiplication, inversion, etc. Matrix multiplication in Pandas can be a little confusing (and lead to errors) if you don’t know the underlying mathematics that powers it.
In this article, we will discuss how to do matrix multiplication in pandas and how to avoid errors.
Multiplication of Matrices
To carry out the multiplication of matrices, we must ensure that the two matrices involved are aligned (or appropriate) for the operation. A matrix has rows and columns; when we want to multiply 2 matrices, the number of columns and rows matters for it to be possible.
We describe matrices to their
columns, e.g., a
2 x 4 matrix has
2 rows and
4 columns. With all this information, the first matrix’s(left matrix) number of columns must equal the 2nd matrix’s (right matrix) number of rows for matrix multiplication to be possible.
2 x 3 matrix can be multiplied by
3 x 2 because there are 3 columns in the first matrix and 3 rows in the second matrix. Also, a
3 x 4 matrix can be multiplied by a
4 x 23 matrix because the number of columns in the 1st matrix equals the number of rows in the 2nd matrix -
However, if we change (or reverse) which matrix is first, the matrix multiplication might not be possible. Using the same examples as earlier, the
3 x 2 matrix can be multiplied by the
2 x 3 matrix because the number of columns of the first column equals the number of rows of the second matrix.
For the second example, the
4 x 23 matrix can be multiplied by the
3 x 4 matrix because the number of columns -
23 - of the 1st matrix is not equal to the number of rows -
3 - of the second matrix.
Check if DataFrames Are Aligned in Pandas
We can check if the data frames we have can carry out matrix operations by checking if the shape of the data frames (matrix) fits the stated rule for matrix multiplication. To achieve this, we will access the
shape property (a tuple with two elements) of the dataframe and compare the column value (the second value within the tuple) of the first dataframe (matrix) to the row value (the first value within the tuple) for the second dataframe (matrix).
Let’s create two dataframes,
other, check for their shape and compare it.
import pandas as pd import numpy as np df = pd.DataFrame([[23, 33], [33, 41]]) other = pd.DataFrame([[31, 0], [20, 1]]) print(df) print(other)
0 1 0 23 33 1 33 41 0 1 0 31 0 1 20 1
Now, let’s check the shape and compare to see if the dataframes can carry out matrix multiplication calculations.
print(df.shape) print(other.shape) if (df.shape == other.shape): print("DataFrames (matrices) align and therefore matrix multiplication possible.") else: print("DataFrames (matrices) don't align and therefore matrix multiplication not possible.")
(2, 2) (2, 2) DataFrames (matrices) align and therefore matrix multiplication is possible.
As you can see, the dataframes align because the numbers of columns in
df are equal to the rows in
other. Now we can use the designed function for matrix multiplication -
dot Function to Carry Out Matrix Multiplication in Pandas
Pandas and Numpy have a
dot() function that we can use for matrix multiplication. We will use both to showcase how to carry out matrix multiplication.
Using the dataframes we created in the previous section, we can illustrate how to use the
dot() function. Let’s get cracking on the matrix multiplication on
Using the pandas
dot() function where the function is applied on the first matrix -
df - and the second matrix -
other - is passed as an argument to the
dot() function as below.
0 1 0 1373 33 1 1843 41
If we are to use the numpy
dot() function, we pass two arguments - the two matrices - but the first matrix is passed first.
[[1373 33] [1843 41]]
Let’s work with another two dataframes -
df2 - created randomly using the
numpy library and carry out the matrix multiplication using the two
import pandas as pd import numpy as np df1 = pd.DataFrame(np.random.randn(3, 3), columns=list('ABC'), index=[1, 2, 3]) df2 = pd.DataFrame(np.random.randn(3, 3), columns=list('ABC'), index=[1, 2, 3]) print(np.dot(df1, df2)) print(df1.dot(df2))
[[ 1.28220783 -1.36789201 0.16335459] [-0.8039172 0.87851003 -0.32282877] [ 1.09767978 -0.71870817 -0.23485835]] ----- ... ValueError: matrices are not aligned
dot() function using the
numpy library worked without errors, but the second
dot() function using the
pandas library didn’t give a
ValueError: matrices are not aligned error message.
The reason for this error message is that when pandas
dot() function executes, it re-indexes
df2 in such a way that the column order of
df1 and the row (index) order of
df2 doesn’t match resulting to a misalignment of matrices. The Numpy
dot() function doesn’t do much and has no errors.
To deal with this error, we will need to align the two dataframes by assigning the index of the second dataframe -
df2 - to the columns of the first dataframe -
import pandas as pd import numpy as np df1 = pd.DataFrame(np.random.randn(3, 3), columns=list('ABC'), index=[1, 2, 3]) df2 = pd.DataFrame(np.random.randn(3, 3), columns=list('ABC'), index=[1, 2, 3]) print(np.dot(df1, df2)) df2.index = df1.columns print(df1.dot(df2))
[[ 1.28220783 -1.36789201 0.16335459] [-0.8039172 0.87851003 -0.32282877] [ 1.09767978 -0.71870817 -0.23485835]] A B C 1 1.282208 -1.367892 0.163355 2 -0.803917 0.878510 -0.322829 3 1.097680 -0.718708 -0.234858
Now, we are errorless, and both matrix multiplication computation work regardless.