# KFold in Python

This article will introduce what `KFold`

does in Python.

## KFold in Python

When working on machine learning, sometimes we get into the dilemma of which machine learning model we should use to solve our problems. For instance, suppose we want to classify the iris flowers, we can use machine learning models such as `SVM`

, `random forest`

, `logistic regression`

, and `KFold`

.

We use models for cross-validation, training, and testing our AI’s. Cross-validation is a technique that allows us to evaluate the model’s performance.

We are looking at machine learning models such as classifying emails as spam or not. Our typical procedure is first to train the model using the label datasets.

After the model is built, we must test the model by using the different datasets. When the model returns the results, we can compare the results with the actual values and measure the model’s accuracy.

There are several ways to train and test the model, `KFold`

is one of them. In this technique, we divide our samples into folds.

If we have 100 samples, we can make them into 5 folds, each containing 20 samples. Then we run multiple iterations in such a way that in our first iteration, we assign our first fold for testing the model and the rest four for training the model.

We will use the second fold for the test and the remaining folds for the training for the next iteration. And once we have gone through all folds as a testing fold, we can now get the average results from our model, giving us the model’s accuracy.

The `KFold`

technique is handy when we provide a variety of samples to our model. We get the average results that will become the accuracy of our model.

Now, let’s go over an example in which we will create a list of items and use `KFold`

to make the folds for testing and training our models.

First of all, we will install the `sklearn`

library using the following command.

```
# python
pip install sklearn
```

Once we have installed the library, now we will import KFold from `sklearn.model_selection`

and use the `KFold`

method to split our data set into three folds, as shown below.

```
# python
from sklearn.model_selection import KFold
kf = KFold(n_splits=3)
dataset = [1,2,3,4,5,6,7,8,9]
for train_index, test_index in kf.split(dataset):
print(train_index, test_index)
```

Output:

In the example, the first iteration of the first 3 numbers is chosen for testing.

So in this way, we use `KFold`

to get the folds for testing and training indexes for our models. When we get the results from these iterations, we can sum them up to get the average which gives us a pretty good idea about the accuracy of our model in machine learning.

**Rana Hasnain Khan**

Rana is a computer science graduate passionate about helping people to build and diagnose scalable web application problems and problems developers face across the full-stack.

LinkedIn