# Scipy scipy.stats.pearsonr Method

Python Scipy `scipy.stats.pearsonr()` method is used to find Pearson correlation coefficient, which represents linear relationships between two variables. It also gives the `p-value` for testing non-correlation.

The value of the Pearson correlation coefficient ranges between `-1` to `+1`. If it is near `-1`, there is a strong negative linear relationship between variables. If it is `0`, there is no linear relation, and at `+1`, there is a strong relationship between variables.

A positive relationship indicates that if one variable’s value increases or goes up, another’s value also increases.

## Syntax of `scipy.stats.pearsonr()`:

``````scipy.stats.pearsonr(x,
y)
``````

### Parameters

`x` It is the input array elements of the first variable or attribute.
`y` It is the input array elements of the second variable or attribute. Length should be equal to x.

## Return

It returns a tuple of two values :

1. `r` : It is the Pearson correlation coefficient. It shows the degree of relationship between `x` and `y`.
2. `p-value`: It is the probability significance value. It checks whether to accept or reject the null hypothesis.

The null hypothesis means that there is no relationship between variables under consideration.

## Example Codes : `scipy.stats.pearsonr()` Method to Find Corelation Coefficient

``````import scipy
from scipy import stats

arr1 = [3, 6, 9, 12]
arr2 = [12, 10, 11, 11]
r, p = scipy.stats.pearsonr(arr1, arr2)

print("The pearson correlation coefficient is:", r)
print("The p-value is:", p)
``````

Output:

``````The pearson correlation coefficient is: -0.31622776601683794
The p-value is: 0.683772233983162
``````

Here, two arrays having equal elements are considered, and they are passed as an argument into the `pearsonr` function. Here we see the negative correlation coefficient as an output because the first array has linearly increasing valued elements, whereas elements are taken randomly in the second array.

Since `p-value` (`0.683772233983162`) is greater than `0.05`, therefore null hypothesis is `true`.

## Example Codes : Using `scipy.stats.pearsonr()` Method to Find Correlation Between `variables within a csv file`

``````import numpy as np
import pandas as pd
import scipy
from scipy import stats

newdata = data[["price","mileage"]].dropna()

r, p = scipy.stats.pearsonr( newdata[ "price" ] , newdata["mileage"])
print("The pearson correlation coefficient between price and mileage is:", r)
print("The p-value is:", p)
``````

Output:

``````The pearson correlation coefficient between price and mileage is: -0.4008381863293672
The p-value is: 4.251481046096957e-97
``````

Here, we use the pandas library to load data as a pandas data frame. The `dataset.csv` file is read. The file contains car data having columns `name`, `price`, `mileage`, `brand`, and `year of manufacture`. Then, we dropped down every column except `price` and `mileage` to check the strength of their relationship.

On analyzing the output value, we can see that the Pearson correlation coefficient is negative, meaning price and mileage have a relatively strong negative linear relationship. Those cars whose price is less will provide the higher mileage, and once the price of the car increases, the mileage value starts to decrease.

Since `p` is very minute (approx 0), thus test hypothesis is `false` and should be rejected.

Contribute
DelftStack is a collective effort contributed by software geeks like you. If you like the article and would like to contribute to DelftStack by writing paid articles, you can check the write for us page.