# Smooth Data in Python

Shivam Arora Oct 10, 2023

Python has a vast application in data analysis and visualization. When we analyze massive datasets containing many observations, we may encounter situations where we have to smooth the curves on a graph to study the final plot more carefully. We will discuss how to achieve this in Python using different methods.

## Use `scipy.signal.savgol_filter()` Method to Smooth Data in Python

Savitzky-Golay filter is a digital filter that uses data points for smoothing the graph. It uses the method of least squares that creates a small window and applies a polynomial on the data of that window, and then uses that polynomial for assuming the center point of the particular window. Next, the window is shifted by one data point, and the process is iterated until all the neighbors are relatively adjusted with each other.

We can use the `scipy.signal.savgol_filter()` function to implement this in Python.

See the following example.

``````import numpy as np
from scipy.signal import savgol_filter
import matplotlib.pyplot as plt

x = np.linspace(0, 2 * np.pi, 100)
y = np.sin(x) + np.random.random(100) * 0.2
yhat = savgol_filter(y, 51, 3)

plt.plot(x, y)
plt.plot(x, yhat, color="green")
plt.show()
``````

Output:

In the above example, we used the filtering method to smooth the data to be plotted on the y-axis. We have plotted both the original and smoothed data so you can observe the difference.

## Use the `numpy.convolve` Method to Smooth Data in Python

The `numpy.convolve()` Gives the discrete, linear convolution of two one-dimensional sequences. We will use this to create moving averages that can filter and smooth out the data.

This is not considered a good method.

For example,

``````import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 2 * np.pi, 100)
y = np.sin(x) + np.random.random(100) * 0.8

def smooth(y, box_pts):
box = np.ones(box_pts) / box_pts
y_smooth = np.convolve(y, box, mode="same")
return y_smooth

plt.plot(x, y)
plt.plot(x, smooth(y, 3))
plt.plot(x, smooth(y, 19))
``````

Output:

In the above example, we plotted two moving averages with a time delta of 3 and 19. We have plotted both of them in the graph.

We can use other methods also to calculate moving averages.

## Use the `statsmodels.kernel_regression` to Smooth Data in Python

Kernel Regression computes the conditional mean `E[y|X]` where `y = g(X) + e` and fits in the model. It can be used to smooth out data based on the control variable.

To perform this, we have to use the `KernelReg()` function from the `statsmodels` module.

For example,

``````from statsmodels.nonparametric.kernel_regression import KernelReg
import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 2 * np.pi, 100)
y = np.sin(x) + np.random.random(100) * 0.2

kr = KernelReg(y, x, "c")
plt.plot(x, y, "+")
y_pred, y_std = kr.fit(x)

plt.plot(x, y_pred)
plt.show()
``````

Output:

Note that this method produces a good result but is considered very slow. We can also use the Fourier transform, but it works only with periodic data.