scipy.signal.savgol_filter()Method to Smooth Data in Python
numpy.convolveMethod to Smooth Data in Python
statsmodels.kernel_regressionto Smooth Data in Python
Python has a vast application in data analysis and visualization. When we analyze massive datasets containing many observations, we may encounter situations where we have to smooth the curves on a graph to study the final plot more carefully. We will discuss how to achieve this in Python using different methods.
scipy.signal.savgol_filter() Method to Smooth Data in Python
Savitzky-Golay filter is a digital filter that uses data points for smoothing the graph. It uses the method of least squares that creates a small window and applies a polynomial on the data of that window, and then uses that polynomial for assuming the center point of the particular window. Next, the window is shifted by one data point, and the process is iterated until all the neighbors are relatively adjusted with each other.
We can use the
scipy.signal.savgol_filter() function to implement this in Python.
See the following example.
import numpy as np from scipy.signal import savgol_filter import matplotlib.pyplot as plt x = np.linspace(0, 2 * np.pi, 100) y = np.sin(x) + np.random.random(100) * 0.2 yhat = savgol_filter(y, 51, 3) plt.plot(x, y) plt.plot(x, yhat, color="green") plt.show()
In the above example, we used the filtering method to smooth the data to be plotted on the y-axis. We have plotted both the original and smoothed data so you can observe the difference.
numpy.convolve Method to Smooth Data in Python
numpy.convolve() Gives the discrete, linear convolution of two one-dimensional sequences. We will use this to create moving averages that can filter and smooth out the data.
This is not considered a good method.
import numpy as np import matplotlib.pyplot as plt x = np.linspace(0, 2 * np.pi, 100) y = np.sin(x) + np.random.random(100) * 0.8 def smooth(y, box_pts): box = np.ones(box_pts) / box_pts y_smooth = np.convolve(y, box, mode="same") return y_smooth plt.plot(x, y) plt.plot(x, smooth(y, 3)) plt.plot(x, smooth(y, 19))
In the above example, we plotted two moving averages with a time delta of 3 and 19. We have plotted both of them in the graph.
We can use other methods also to calculate moving averages.
statsmodels.kernel_regression to Smooth Data in Python
Kernel Regression computes the conditional mean
y = g(X) + e and fits in the model. It can be used to smooth out data based on the control variable.
To perform this, we have to use the
KernelReg() function from the
from statsmodels.nonparametric.kernel_regression import KernelReg import numpy as np import matplotlib.pyplot as plt x = np.linspace(0, 2 * np.pi, 100) y = np.sin(x) + np.random.random(100) * 0.2 kr = KernelReg(y, x, "c") plt.plot(x, y, "+") y_pred, y_std = kr.fit(x) plt.plot(x, y_pred) plt.show()
Note that this method produces a good result but is considered very slow. We can also use the Fourier transform, but it works only with periodic data.