This article will introduce what
KFold does in Python.
KFold in Python
When working on machine learning, sometimes we get into the dilemma of which machine learning model we should use to solve our problems. For instance, suppose we want to classify the iris flowers, we can use machine learning models such as
logistic regression, and
We use models for cross-validation, training, and testing our AI’s. Cross-validation is a technique that allows us to evaluate the model’s performance.
We are looking at machine learning models such as classifying emails as spam or not. Our typical procedure is first to train the model using the label datasets.
After the model is built, we must test the model by using the different datasets. When the model returns the results, we can compare the results with the actual values and measure the model’s accuracy.
There are several ways to train and test the model,
KFold is one of them. In this technique, we divide our samples into folds.
If we have 100 samples, we can make them into 5 folds, each containing 20 samples. Then we run multiple iterations in such a way that in our first iteration, we assign our first fold for testing the model and the rest four for training the model.
We will use the second fold for the test and the remaining folds for the training for the next iteration. And once we have gone through all folds as a testing fold, we can now get the average results from our model, giving us the model’s accuracy.
KFold technique is handy when we provide a variety of samples to our model. We get the average results that will become the accuracy of our model.
Now, let’s go over an example in which we will create a list of items and use
KFold to make the folds for testing and training our models.
First of all, we will install the
sklearn library using the following command.
# python pip install sklearn
Once we have installed the library, now we will import KFold from
sklearn.model_selection and use the
KFold method to split our data set into three folds, as shown below.
# python from sklearn.model_selection import KFold kf = KFold(n_splits=3) dataset = [1,2,3,4,5,6,7,8,9] for train_index, test_index in kf.split(dataset): print(train_index, test_index)
In the example, the first iteration of the first 3 numbers is chosen for testing.
So in this way, we use
KFold to get the folds for testing and training indexes for our models. When we get the results from these iterations, we can sum them up to get the average which gives us a pretty good idea about the accuracy of our model in machine learning.