Lasso Regression in Python

Lasso Regression in Python

  1. The Lasso Regression in Python
  2. Implement Lasso Regression in Python

Regression, a statistical technique, determines the relationship between dependent and independent variables. We can use regression as an ML model for predictive analysis in Python.

Linear and logistic regression were the most common regression techniques. It has evolved, and now improved versions of regression have been introduced.

There can be some concerns regarding the accuracy of the technique.

It has been discussed and proved that the traditional regression techniques lead to a problem of overfitting when the number of independent variables increases, increasing the degree of the polynomial simultaneously.

Another vital factor for overfitting is noisy and inconsistent data, leading to inaccurate results.

Regularization was introduced to counter this problem. Lasso Regression is one such technique that uses regularization and variable selection in predictive analysis.

The Lasso Regression in Python

Lasso regression helps tackle situations with more irrelevant features in the dataset. We need to reduce the coefficient of these features to the least possible to nullify their effect on the prediction.

Lasso regression introduces a value in the cost function called the L1 penalty. During the gradient optimization process, weights of the irrelevant features are reduced to a minimum (almost 0) with the L1 penalty.

These shrunken weights are not considered in the function to remove their relevance.

Mathematical Equation of Lasso Regression

The mathematical equation of Lasso regression is shown in the above figure. Here, the value of λ is the shrinkage parameter determining the shrinkage amount.

A higher value means more bias and less variance. For λ = 0, all features are considered equivalent, and a value of infinity determines that no feature is considered.

The main concept of inducing the L1 penalty with the parameter is to decrease the weights of the feature when the parameter value increases.

Implement Lasso Regression in Python

We use the sklearn.linear_model.Lasso class to implement Lasso regression in Python. We can create a model using this class and use it with the required train and test data to make the predictions.

It takes the parameter alpha, the constant value that multiplies the L1 penalty. Other parameters like fit_intercept, normalize, precompute, copy_X, and more are also accepted in this object.

Let us implement an example of Lasso regression in Python.

from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split
from sklearn import datasets

data = datasets.load_boston()
x_data = data.data
y_data = data.target

train_x, test_x, train_y, test_y = train_test_split(x_data, y_data, test_size=0.3, random_state=42)

lasso = Lasso(alpha=1.0)
lasso.fit(train_x, train_y)
print(lasso.score(test_x, test_y)*100)

Output:

65.59060829154339

In the example above, we load a sample dataset from the sklearn module, and it is split into x_data and y_data. We use the train_test_split class to divide the dataset into train and test datasets.

We use the training dataset to train the Lasso regression model using the fit() function. Then, we check the score of the predicted data using the score() function.

The output shows that this model predicted the data with 65.59% accuracy.

We can perform cross-validation to find the optimal value for the alpha parameter of the Lasso model.

We can import the sklearn.linear_model.LassoCV class. Then, we can specify the number of folds in the cross-validation technique.

See the code below.

from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split
from sklearn import datasets

data = datasets.load_boston()
x_data = data.data
y_data = data.target

train_x, test_x, train_y, test_y = train_test_split(x_data, y_data, test_size=0.3, random_state=42)

from sklearn.linear_model import LassoCV
lassocv = LassoCV(cv=5, random_state=0, max_iter=10000)
lassocv.fit(train_x, train_y)
al = lassocv.alpha_
print(al)

lasso = Lasso(alpha=al)
lasso.fit(train_x, train_y)
print(lasso.score(test_x, test_y)*100)

Output:

0.7051444556162024
67.48582731460068

In the above example, we find the optimal value for the alpha value, and then this value is used to train and test the previous model. The model’s accuracy with the new alpha value is 67.48%.

Author: Manav Narula
Manav Narula avatar Manav Narula avatar

Manav is a IT Professional who has a lot of experience as a core developer in many live projects. He is an avid learner who enjoys learning new things and sharing his findings whenever possible.

LinkedIn

Related Article - Python Math

  • Calculate Factorial in Python
  • Calculate Inverse of Cosine in Python
  • Calculate Modular Multiplicative Inverse in Python
  • Fit Poisson Distribution to Different Datasets in Python
  • Reduce Fractions in Python
  • Define an Infinite Value in Python