Calculate the Mean Squared Error in Python

Calculate the Mean Squared Error in Python

  1. Calculate the Mean Squared Error With the Help of an Algorithm in Python
  2. Calculate the Mean Squared Error With the Help of the Numpy Module in Python
  3. Calculate the Mean Squared Error With the Help of Scikit-Learn in Python

We will learn, with this explanation, how to calculate the mean squared error with the help of an algorithm, Numpy, and Scikit-Learn in Python.

Calculate the Mean Squared Error With the Help of an Algorithm in Python

The MSE tells us how close a regression line is to a set of points by taking the distances from those points to the regression line. These distances are called the errors, and those errors are squared to remove any negative signs.

mean squared error

The Mean Squared Error is an important function in machine learning, especially linear regression. We will calculate the MSE (Mean Squared Error) using two approaches, and in the first approach, we will calculate the MSE step-by-step.

In a second approach, we will calculate the MSE in a single line using numpy.

First, we need to import the Numpy, and to demonstrate, we will calculate the Mean Squared Error for two arrays, the first array is original_marks, and the second is estimated_marks. We will execute both arrays.

original_marks=np.array([87,64,77,91])
estimated_marks=np.array([67,55,71,80])

To display original_marks:

original_marks

Output:

array([87, 64, 77, 91])

To display estimated_marks:

estimated_marks

Output:

array([67, 55, 71, 80])

Now we will proceed according to the formula of MSE. First, we need to subtract the original_marks from estimated_marks, then we need to square, and then we need to calculate the mean.

That is why first, we need to calculate the difference between the original_marks and estimated_marks using the subtract() method.

diff_marks=np.subtract(original_marks,estimated_marks)
diff_marks

Output:

array([20,  9,  6, 11])

Now we need to take the square the diff_marks. To do this, we will use the square() method, and we need to provide the difference we calculated.

sqr_marks=np.square(diff_marks)
sqr_marks

Output:

array([400,  81,  36, 121], dtype=int32)

We will apply the mean to this array, the Mean Squared Error or MSE of marks. We will use the mean() method.

mse_marks=sqr_marks.mean()
mse_marks

Output:

159.5

Complete Python Code:

import numpy as np

original_marks=np.array([87,64,77,91])
estimated_marks=np.array([67,55,71,80])

diff_marks=np.subtract(original_marks,estimated_marks)

sqr_marks=np.square(diff_marks)

mse_marks=sqr_marks.mean()

Calculate the Mean Squared Error With the Help of the Numpy Module in Python

Now we will calculate the Mean Squared Error in a single line, and again, we will use the same function to calculate the MSE.

mse_marks=np.square(original_marks-estimated_marks).mean()
mse_marks

We can see the output is the same:

159.5

Complete Python Code:

import numpy as np

original_marks=np.array([87,64,77,91])
estimated_marks=np.array([67,55,71,80])

#using numpy

mse_marks=np.square(original_marks-estimated_marks).mean()

Calculate the Mean Squared Error With the Help of Scikit-Learn in Python

Now, we will obtain the Mean Squared Error using the scikit-learn library. Let’s import numpy, prepare the data with the ndmin as two that is the dimension, and then reshape it; so we have five rows and one column.

In the next line, we will define an array that would be the y-value for testing data, and then we will import the LogisticRegression class from linear_model using the sklearn module. We will then create an instance of this class.

import numpy as np
from sklearn.linear_model import LogisticRegression

x_training_data=np.array([166,151,194,140,139],ndmin=2)
x_training_data=x_training_data.reshape((5,1))
y_training_data=np.array([62,71,67,44,91])
MD=LogisticRegression()

Now, we will see whether the model fits with the training data or not, so we will declare a variable called y_pr_data. It will be equal to MD.predict() then we will feed it the x_training_data.

y_pr_data=MD.predict(x_training_data)
y_pr_data

Output:

array([76, 76, 77, 83, 76])

Now, we will find the Mean Squared Error. We know the formula of the Mean Squared Error, so we will apply it to calculate the error between the predicted value and the actual value.

mse=np.mean(((y_training_data-y_pr_data)**2))
mse

Output:

3.2

There is a much simpler way to implement Mean Squared Error using the mean_squared_d_errorrror() function. We will import it from the metrics class and then feed the actual and predicted data as we feed above.

from sklearn.metrics import mean_squared_d_errorrror
mean_squared_error(y_training_data,y_pr_data)

When we run this cell, we get the same result as the above.

3.2

Complete Python Code:

import numpy as np
from sklearn.linear_model import LogisticRegression

x_training_data=np.array([166,151,194,140,139],ndmin=2)
x_training_data=x_training_data.reshape((5,1))
y_training_data=np.array([62,71,67,44,91])
MD=LogisticRegression()

y_pr_data=MD.predict(x_training_data)
# y_pr_data
mse=np.mean(((y_training_data-y_pr_data)**2))
# mse

from sklearn.metrics import mean_squared_d_errorrror
mean_squared_error(y_training_data,y_pr_data)
Salman Mehmood avatar Salman Mehmood avatar

Hello! I am Salman Bin Mehmood(Baum), a software developer and I help organizations, address complex problems. My expertise lies within back-end, data science and machine learning. I am a lifelong learner, currently working on metaverse, and enrolled in a course building an AI application with python. I love solving problems and developing bug-free software for people. I write content related to python and hot Technologies.

LinkedIn

Related Article - Python Math

  • Calculate Factorial in Python
  • Calculate Inverse of Cosine in Python
  • Calculate Modular Multiplicative Inverse in Python
  • Fit Poisson Distribution to Different Datasets in Python
  • Reduce Fractions in Python
  • Define an Infinite Value in Python
  • Related Article - Python Error

  • Python PermissionError: [WinError 5] Access Is Denied
  • Python TypeError: 'DataFrame' Object Is Not Callable
  • Python TypeError: Can't Convert 'List' Object to STR
  • Local Variable Referenced Before Assignment Error in Python
  • Python Handling Socket.Error: [Errno 104] Connection Reset by Peer
  • Python Is Not Recognized in Windows 10