# Scipy stats.zscore Function

`z-score` is a statistic method that helps calculate how many values standard deviation away is a particular value away from the mean value. The `z-score` is calculated with the help of the following formula.

``````z = (X – μ) / σ
``````

In which,

• X is a particular value from the data
• μ is the mean value
• σ is the standard deviation

This tutorial will show how to calculate the `z-score` value of any data in Python using the `SciPy` library.

## the `scipy.stats.zscore` Function

The `scipy.stats.zscore` function of the `SciPy` library helps to calculate the relative `z-score` of the given input raw data along with the data’s mean and standard deviation. It is defined as `scipy.stats.zscore(a, axis, ddof, nan_policy)`.

Following are the parameters of the `scipy.stats.zscore` function.

`a (array)` An array-like object of the raw input data.
`axis (int)` It defines the axis along which the function computes the `z-score` value. The default value is `0` i.e, the function computes over the whole array.
`ddof (int)` It defines the degree of freedom correction in the whole computation of the standard deviation.
`nan_policy` This parameter decides how to deal when there are NaN values in the input data. There are three decision parameters in the parameter, `propagate`, `raise`, `omit`. `propagate` simply returns the NaN value, `raise` returns an error and `omit` simply ignores the NaN values and the function continues with computation. These decision parameters are defined in single quotes `''`. Also, NaN values never affect the `z-score` value that is calculated for the other values present in the input data.

All the parameters except the `a (array)` parameter are optional. That means it is not necessary to define them every time while using the `scipy.stats.zscore` function.

Now, let us use the `scipy.stats.zscore` function on `one-dimensional array`, `multi dimensional array`, and `Pandas Dataframe`.

## Calculating the `z-score` for a `One-dimensional` Array in Python

``````import numpy as np
import scipy.stats as stats

input_data = np.array([5, 10, 20, 35, 25, 22, 19, 19, 50, 45, 62])

stats.zscore(input_data)
``````

Output:

``````array([-1.3916106 , -1.09379511, -0.49816411,  0.39528239, -0.20034861,
-0.37903791, -0.55772721, -0.55772721,  1.28872889,  0.99091339,
2.00348608])
``````

Note that each `z-score` value tells that how many standard deviation values away is its corresponding value away from the mean value. Here, the `negative` sign represents that that value is that many standard deviations `below` the mean value, and the positive sign represents that that value is that many standard deviations `above` the mean value. If a `z-score` value comes out to be `0`, then that value is `0` standard deviation values away from the mean value.

## Calculating the `z-score` for a Multi-Dimensional Array in Python

``````import numpy as np
import scipy.stats as stats

data = np.array([[5, 10, 20, 35],
[25, 22, 19, 19],
[50, 45, 62, 28],
[24, 45, 15, 30]])

stats.zscore(input_data)
``````

Output:

``````array([-1.3916106 , -1.09379511, -0.49816411,  0.39528239, -0.20034861,
-0.37903791, -0.55772721, -0.55772721,  1.28872889,  0.99091339,
2.00348608])
``````

## Calculating the `z-score` for a `Pandas Dataframe` in Python

In this, we will use the `randint()` function of the `NumPy` library. This function is used to generate random sample numbers and store them in the form of a `NumPy` array. After creating the `NumPy` array, we will use that array as a `Pandas Dataframe`.

``````import pandas as pd
import numpy as np
import scipy.stats as stats

input_data = pd.DataFrame(np.random.randint(0, 30, size=(4, 4)), columns=['W', 'X', 'Y', 'Z'])
print(input_data)
``````
``````    W   X   Y   Z
0   7   9   2  15
1  11  23  15  28
2  28  11  25   2
3  11  19  14  15
``````
``````input_data.apply(stats.zscore)
``````

Output:

``````          W	        X	        Y	        Z
0	-0.894534	-1.135815	-1.471534	 0.000000
1	-0.400998	 1.310556	 0.122628	 1.414214
2	 1.696529	-0.786334	 1.348907	-1.414214
3	-0.400998	 0.611593	 0.000000	 0.000000
``````

Note that `apply()` function of the `Pandas` library is used to calculate the `z-score` value for each value in the given dataframe. This function is used to apply a specific function defined as a function argument of the `apply()` function to each value of the Pandas series or dataframe.

Contribute
DelftStack is a collective effort contributed by software geeks like you. If you like the article and would like to contribute to DelftStack by writing paid articles, you can check the write for us page.

## Related Article - Scipy Stats

• Scipy stats.normaltest Fucntion
• Scipy stats.beta Function