Plot an ROC Curve in Python
- ROC Curve Definition in Python
- Scikit-Learn Library in Python
- Python Code to Plot the ROC Curve
- Code Explanation
In this guide, we’ll help you get to know more about this Python function and the method you can use to plot a ROC curve as the program output.
ROC Curve Definition in Python
The term ROC curve stands for Receiver Operating Characteristic curve. This curve is basically a graphical representation of the performance of any classification model at all classification thresholds.
There are two parameters of this curve:
- True Positive Rate(TPR) - Stands for real, i.e true sensitivity
- False Positive Rate(FPR) - Stands for pseudo, i.e false sensitivity
Both parameters are known as operating characteristics and are used as factors to define the ROC curve.
In Python, the model’s efficiency is determined by seeing the area under the curve (AUC). Thus, the most efficient model has the AUC equal to 1, and the least efficient model has the AUC equal to 0.5.
Scikit-Learn Library in Python
Scikit-learn library is one of the most important open-source libraries used to perform machine learning in Python. This library consists of many tools for tasks like classification, clustering, and regression.
In this tutorial, several functions are used from this library that will help in plotting the ROC curve. These functions are:
make_classification- This function is imported because it helps in generating a random n-class classification problem by creating clusters of points.
RandomForestClassifier- This function is imported as
Random Forest Classifierand is used as a sample model in this tutorial on which the ROC curve is made.
train_test_split- This function is used to split the whole data into two subsets (
Test) that are used for training and testing the data.
roc_curve- This function is used to return the ROC curve of a given model.
Python Code to Plot the ROC Curve
import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from sklearn.datasets import make_classification from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import roc_curve def plot_roc_curve(fper, tper): plt.plot(fper, tper, color='red', label='ROC') plt.plot([0, 1], [0, 1], color='green', linestyle='--') plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.title('Receiver Operating Characteristic Curve') plt.legend() plt.show() data_X, cls_lab = make_classification(n_samples=2100, n_classes=2, weights=[1,1], random_state=2) train_X, test_X, train_y, test_y = train_test_split(data_X, cls_lab, test_size=0.5, random_state=2) model =RandomForestClassifier() model.fit(train_X, train_y) prob = model.predict_proba(test_X) prob = probs[:, 1] fper, tper, thresholds = roc_curve(test_y, prob) plot_roc_curve(fper, tper)
First, all the libraries and functions that are required to plot a ROC curve are imported. Then a function called
plot_roc_curve is defined in which all the critical factors of the curve like the color, labels, and title are mentioned using the
Matplotlib library. After that, the
make_classification function is used to make random samples, and then they are divided into
test sets with the help of the
train_test_split function. Here, the
train-test ratio of the data is
0.50. Then the
RandomForestClassifier algorithm is used to fit the
train_y data. Finally, the
roc_curve function is used to plot the ROC Curve.