# SciPy stats.zscore Function

Lakshay Kapoor Jan 30, 2023 Sep 25, 2021

`z-score` is a statistic method that helps calculate how many values standard deviation away is a particular value away from the mean value. The `z-score` is calculated with the help of the following formula.

``````z = (X – μ) / σ
``````

In which,

• X is a particular value from the data
• μ is the mean value
• σ is the standard deviation

This tutorial will show how to calculate the `z-score` value of any data in Python using the `SciPy` library.

## the `scipy.stats.zscore` Function

The `scipy.stats.zscore` function of the `SciPy` library helps to calculate the relative `z-score` of the given input raw data along with the data’s mean and standard deviation. It is defined as `scipy.stats.zscore(a, axis, ddof, nan_policy)`.

Following are the parameters of the `scipy.stats.zscore` function.

`a (array)` An array-like object of the raw input data.
`axis (int)` It defines the axis along which the function computes the `z-score` value. The default value is `0` i.e, the function computes over the whole array.
`ddof (int)` It defines the degree of freedom correction in the whole computation of the standard deviation.
`nan_policy` This parameter decides how to deal when there are NaN values in the input data. There are three decision parameters in the parameter, `propagate`, `raise`, `omit`. `propagate` simply returns the NaN value, `raise` returns an error and `omit` simply ignores the NaN values and the function continues with computation. These decision parameters are defined in single quotes `''`. Also, NaN values never affect the `z-score` value that is calculated for the other values present in the input data.

All the parameters except the `a (array)` parameter are optional. That means it is not necessary to define them every time while using the `scipy.stats.zscore` function.

Now, let us use the `scipy.stats.zscore` function on `one-dimensional array`, `multi dimensional array`, and `Pandas Dataframe`.

## Calculating the `z-score` for a `One-dimensional` Array in Python

``````import numpy as np
import scipy.stats as stats

input_data = np.array([5, 10, 20, 35, 25, 22, 19, 19, 50, 45, 62])

stats.zscore(input_data)
``````

Output:

``````array([-1.3916106 , -1.09379511, -0.49816411,  0.39528239, -0.20034861,
-0.37903791, -0.55772721, -0.55772721,  1.28872889,  0.99091339,
2.00348608])
``````

Note that each `z-score` value tells that how many standard deviation values away is its corresponding value away from the mean value. Here, the `negative` sign represents that that value is that many standard deviations `below` the mean value, and the positive sign represents that that value is that many standard deviations `above` the mean value. If a `z-score` value comes out to be `0`, then that value is `0` standard deviation values away from the mean value.

## Calculating the `z-score` for a Multi-Dimensional Array in Python

``````import numpy as np
import scipy.stats as stats

data = np.array([[5, 10, 20, 35],
[25, 22, 19, 19],
[50, 45, 62, 28],
[24, 45, 15, 30]])

stats.zscore(input_data)
``````

Output:

``````array([-1.3916106 , -1.09379511, -0.49816411,  0.39528239, -0.20034861,
-0.37903791, -0.55772721, -0.55772721,  1.28872889,  0.99091339,
2.00348608])
``````

## Calculating the `z-score` for a `Pandas Dataframe` in Python

In this, we will use the `randint()` function of the `NumPy` library. This function is used to generate random sample numbers and store them in the form of a `NumPy` array. After creating the `NumPy` array, we will use that array as a `Pandas Dataframe`.

``````import pandas as pd
import numpy as np
import scipy.stats as stats

input_data = pd.DataFrame(np.random.randint(0, 30, size=(4, 4)), columns=['W', 'X', 'Y', 'Z'])
print(input_data)
``````
``````    W   X   Y   Z
0   7   9   2  15
1  11  23  15  28
2  28  11  25   2
3  11  19  14  15
``````
``````input_data.apply(stats.zscore)
``````

Output:

``````          W	        X	        Y	        Z
0	-0.894534	-1.135815	-1.471534	 0.000000
1	-0.400998	 1.310556	 0.122628	 1.414214
2	 1.696529	-0.786334	 1.348907	-1.414214
3	-0.400998	 0.611593	 0.000000	 0.000000
``````

Note that `apply()` function of the `Pandas` library is used to calculate the `z-score` value for each value in the given dataframe. This function is used to apply a specific function defined as a function argument of the `apply()` function to each value of the Pandas series or dataframe.

Lakshay Kapoor is a final year B.Tech Computer Science student at Amity University Noida. He is familiar with programming languages and their real-world applications (Python/R/C++). Deeply interested in the area of Data Sciences and Machine Learning.