F-Test in Python

This tutorial is about F Statistics, F Distribution and how to perform F-tests on your data using Python.

An F statistic is a number obtained after an ANOVA Test or a Regression Analysis to determine if the means of two populations differ substantially. It is comparable to a T statistic from a T-test, and performing a T-test will tell you whether a single variable is statistically significant, but an F test will tell you if a set of variables is statistically significant.

What does statistically significant means?

If you have a substantial result, it probably didn’t occur by coincidence that you got those results. If your test results are not statistically significant, you must discard them because they are ineffective, and you cannot reject the null hypothesis.

F-Statistics and P-Values

You must consider the F statistic and the p-value when determining if your overall results are significant.

Why? A significant result does not necessarily imply that all of your variables are also significant. Simply put, the statistic involves comparing the combined effect of all the variables.

F Value in ANOVA

An ANOVA test is a method of statistical analysis that evaluates for variance-based mean differences to see if there’s a statistically important difference between two or more categorical groups.

ANOVA divides the independent variable into two or more groups, another important component. For instance, one or more groups might be predicted to impact the dependent variable, whereas another group might be employed as a control group and not be predicted to have an impact.

In one way, in ANOVA, the F value acts like a tool that helps to answer the question of whether the variance between the means of two statistics or populations is significantly different. The P value, which is the likelihood of obtaining a result at least as extreme as the one observed, given that the null hypothesis is true, is likewise determined by the F value in the ANOVA test.

The test statistic known as the f ratio can be calculated as follows:

$$F = \frac{Var(X)}{Var(Y)}$$

To perform the following test using Python, we can use the SciPy module in Python.

SciPy offers algorithms for many problem types, including optimization, integration, interpolation, eigenvalue problem, algebraic equations, differential equations, statistics, and many others.

To install scipy, run this command:

pip install scipy


You can use the following class in the scipy.stats module. The stats class in scipy contains all the necessary functions and classes to perform statistical operations.

>>> from scipy.stats import f


The <span style="color: blue;">scipy.stats.f</span> has a CDF (Cumulative distribution function) method. Using p-value can be calculated for the given statistics.

Hence, you can determine whether the reject or accept the NULL hypothesis for the given alpha level.

Consider the example below:

Importing the modules and creating variables.

from scipy.stats import f
import numpy as np
a = np.array([1,2,1,2,1,2,1,2,1,2])
b = np.array([1,3,-1,2,1,5,-1,6,-1,2])
alpha = 0.05 # you can set to level.


The formula to calculate the F value is Var(X)/Var(Y).

# calculating F value.
F = a.var()/b.var()


Since F is a distribution:

df1 = len(a) - 1
df2 = len(b) - 1


The scipy.stats.f class contains the function we can use to calculate the p-value and the critical values for the given statistics.

# Fetching p-value.
p_value = f.cdf(F, df1, df2)
p_value > alpha


From the above code, we can get the p-value calculated using F statistics; we will reject the NULL hypothesis, which is the variance of a is equal to the variance of b.

Note: F-test is quite sensitive to the non-normality of given statistics.

Suppose you are not confident about the provided data reflecting normality. A more robust alternative to F-test is Bartlett’s test or Levene’s test.

Scipy also provides the facility to perform these tests.

Bartlett’s test:

>>> from scipy.stats import bartlett
>>> x = [8.88, 9.12, 9.04, 8.98, 9.00, 9.08, 9.01, 8.85, 9.06, 8.99]
>>> y = [8.88, 8.95, 9.29, 9.44, 9.15, 9.58, 8.36, 9.18, 8.67, 9.05]
>>> z = [8.95, 9.12, 8.95, 8.85, 9.03, 8.84, 9.07, 8.98, 8.86, 8.98]
>>> stat, p = bartlett(x, y, z)
>>> p
1.1254782518834628e-05


The p-value is very small; we can say that the given population does not have equal variance.

This is because of the difference in variances.

>>> [np.var(x, ddof=1) for x in [x, y, z]]
[0.007054444444444413, 0.13073888888888888, 0.008890000000000002]


Levene’s test:

>>> from scipy.stats import levene
>>> x = [8.88, 9.12, 9.04, 8.98, 9.00, 9.08, 9.01, 8.85, 9.06, 8.99]
>>> y = [8.88, 8.95, 9.29, 9.44, 9.15, 9.58, 8.36, 9.18, 8.67, 9.05]
>>> z = [8.95, 9.12, 8.95, 8.85, 9.03, 8.84, 9.07, 8.98, 8.86, 8.98]
>>> stat, p = levene(x, y, z)
>>> p
0.002431505967249681


The p-value is very small; we can say that the given population does not have equal variance.

>>> [np.var(x, ddof=1) for x in [x, y, z]]
[0.007054444444444413, 0.13073888888888888, 0.008890000000000002]


Preet writes his thoughts about programming in a simplified manner to help others learn better. With thorough research, his articles offer descriptive and easy to understand solutions.