Smooth Data in Python

Smooth Data in Python

Shivam Arora Mar-21, 2022 Jul-02, 2021 Python Python Graph
  1. Use scipy.signal.savgol_filter() Method to Smooth Data in Python
  2. Use the numpy.convolve Method to Smooth Data in Python
  3. Use the statsmodels.kernel_regression to Smooth Data in Python

Python has a vast application in data analysis and visualization. When we analyze massive datasets containing many observations, we may encounter situations where we have to smooth the curves on a graph to study the final plot more carefully. We will discuss how to achieve this in Python using different methods.

Use scipy.signal.savgol_filter() Method to Smooth Data in Python

Savitzky-Golay filter is a digital filter that uses data points for smoothing the graph. It uses the method of least squares that creates a small window and applies a polynomial on the data of that window, and then uses that polynomial for assuming the center point of the particular window. Next, the window is shifted by one data point, and the process is iterated until all the neighbors are relatively adjusted with each other.

We can use the scipy.signal.savgol_filter() function to implement this in Python.

See the following example.

import numpy as np
from scipy.signal import savgol_filter
import matplotlib.pyplot as plt

x = np.linspace(0,2*np.pi,100)
y = np.sin(x) + np.random.random(100) * 0.2
yhat = savgol_filter(y, 51, 3)

plt.plot(x, y)
plt.plot(x,yhat, color='green')
plt.show()

Output:

python smooth data 1

In the above example, we used the filtering method to smooth the data to be plotted on the y-axis. We have plotted both the original and smoothed data so you can observe the difference.

Use the numpy.convolve Method to Smooth Data in Python

The numpy.convolve() Gives the discrete, linear convolution of two one-dimensional sequences. We will use this to create moving averages that can filter and smooth out the data.

This is not considered a good method.

For example,

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0,2*np.pi,100)
y = np.sin(x) + np.random.random(100) * 0.8

def smooth(y, box_pts):
    box = np.ones(box_pts)/box_pts
    y_smooth = np.convolve(y, box, mode='same')
    return y_smooth

plt.plot(x, y)
plt.plot(x, smooth(y,3))
plt.plot(x, smooth(y,19))

Output:

python smooth data 2

In the above example, we plotted two moving averages with a time delta of 3 and 19. We have plotted both of them in the graph.

We can use other methods also to calculate moving averages.

Use the statsmodels.kernel_regression to Smooth Data in Python

Kernel Regression computes the conditional mean E[y|X] where y = g(X) + e and fits in the model. It can be used to smooth out data based on the control variable.

To perform this, we have to use the KernelReg() function from the statsmodels module.

For example,

from statsmodels.nonparametric.kernel_regression import KernelReg
import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0,2*np.pi,100)
y = np.sin(x) + np.random.random(100) * 0.2

kr = KernelReg(y,x,'c')
plt.plot(x, y, '+')
y_pred, y_std = kr.fit(x)

plt.plot(x, y_pred)
plt.show()

Output:

python smooth data 3

Note that this method produces a good result but is considered very slow. We can also use the Fourier transform, but it works only with periodic data.

Related Article - Python Graph

  • Graphs Data Structure in Python
  • Color Spectrums in Python
  • Quantile-Quantile Plot in Python