# Perform Logistic Regression in Python

The following tutorial demonstrates how to perform logistic regression on Python.

Let us download a sample dataset to get started with. We will use a user dataset containing information about the user’s gender, age, and salary and predict if a user will eventually buy the product.

Take a look at our dataset.

We will now start creating our model by importing pertinent libraries such as `pandas`

, `numpy`

and `matplotlib`

.

## Perform Logistic Regression in Python

Importing pertinent libraries:

```
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
```

Let us import our dataset using `pandas`

.

Reading dataset:

```
dataset = pd.read_csv('log_data.csv')
```

We will now select the `Age`

and `Estimated salary`

features from our dataset to train our model to predict if a user purchases a product or not. Here, `gender`

and `user id`

won’t play a significant role in predicting; we ignore them in the training process.

```
x = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values
```

Let’s split the dataset into training and testing data. We divide them into 75% for training the model and the rest 25% for testing the model’s performance.

We do this using `train_test_split`

function in `sklearn`

library.

```
from sklearn.model_selection import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(
x, y, test_size = 0.25, random_state = 0)
```

We perform the feature scaling process since the `Age`

and `Salary`

features lie in a different range. This is essential since one feature can dominate the other while the training process is avoided.

```
from sklearn.preprocessing import StandardScaler
sc_x = StandardScaler()
xtrain = sc_x.fit_transform(xtrain)
xtest = sc_x.transform(xtest)
```

Both the features lie in the range from -1 to 1, which will ensure both features contribute equally to decision-making (i.e., predicting process). Let us take a look at updated features.

```
print (xtrain[0:10, :])
```

```
[[ 0.58164944 -0.88670699]
[-0.60673761 1.46173768]
[-0.01254409 -0.5677824 ]
[-0.60673761 1.89663484]
[ 1.37390747 -1.40858358]
[ 1.47293972 0.99784738]
[ 0.08648817 -0.79972756]
[-0.01254409 -0.24885782]
[-0.21060859 -0.5677824 ]
[-0.21060859 -0.19087153]]
```

Let us finally train our model; in our case, we will use the logistic regression model, which we will import from the `sklearn`

library.

```
from sklearn.linear_model import LogisticRegression
classifier1 = LogisticRegression(random_state = 0)
classifier1.fit(xtrain, ytrain)
```

Since we have now trained our model, let us do the prediction on our testing data to evaluate our model.

```
y_pred = classifier1.predict(xtest)
```

Let us now create a confusion matrix based on our testing data and the predictions we obtained in the last procedure.

```
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(ytest, y_pred)
print ("Confusion Matrix : \n", cm)
```

```
Confusion Matrix :
[[65 3]
[ 8 24]]
```

Let us calculate the accuracy of our model using the `sklearn`

library.

```
from sklearn.metrics import accuracy_score
print ("Accuracy score : ", accuracy_score(ytest, y_pred))
```

```
Accuracy score : 0.89
```

We got a satisfying accuracy score of `0.89`

from our model, which signifies that our model can very well predict if a user will buy a product or not.

Thus, we can successfully perform logistic regression using Python with the above method.

**Preet Sanghavi**