Numpy Autocorrelation

In data science, variables of a dataset can be related to each other in some way or the other. The relationship could be directly proportional or indirectly proportional. A simple change in one variable might change some variable slightly or maybe, drastically. This phenomenon is known as correlation.

Autocorrelation refers to a correlation between a set of time signals with an outdated or old version of itself. The two sets of time signals have some time difference between them.

Calculate Autocorrelation in NumPy

The robust data science library, NumPy, has an in-built function, correlate(), that can be used to find a correlation between two 1D sequences. It accepts two 1D arrays and a type of mode.

The mode type can be valid, same, and full, and this parameter is optional. The default value for this parameter is valid.

To learn more about this function, refer to the official documents here

import numpy

myArray = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
myArray = numpy.array(myArray)
result = numpy.correlate(myArray, myArray, mode = 'full')
result = result[result.size // 2 :]


[385 330 276 224 175 130  90  56  29  10]

In the above code, we first define a list of numbers and then convert it to a NumPy array using the NumPy’s array() method. Then we call our method of interest correlate() to compute our data’s autocorrelation. We are using the full mode for the calculations.

The results are stored in a result variable and then sliced. The slicing part is crucial since the correlate() method returns an array of size 2 * length of our array - 1, and the values of our interest lie in the second half, that is, [(result.size // 2), result.size).

DelftStack is a collective effort contributed by software geeks like you. If you like the article and would like to contribute to DelftStack by writing paid articles, you can check the write for us page.