Regression, a statistical technique, determines the relationship between dependent and independent variables. We can use regression as an ML model for predictive analysis in Python.
Linear and logistic regression were the most common regression techniques. It has evolved, and now improved versions of regression have been introduced.
There can be some concerns regarding the accuracy of the technique.
It has been discussed and proved that the traditional regression techniques lead to a problem of overfitting when the number of independent variables increases, increasing the degree of the polynomial simultaneously.
Another vital factor for overfitting is noisy and inconsistent data, leading to inaccurate results.
Regularization was introduced to counter this problem. Lasso Regression is one such technique that uses regularization and variable selection in predictive analysis.
The Lasso Regression in Python
Lasso regression helps tackle situations with more irrelevant features in the dataset. We need to reduce the coefficient of these features to the least possible to nullify their effect on the prediction.
Lasso regression introduces a value in the cost function called the L1 penalty. During the gradient optimization process, weights of the irrelevant features are reduced to a minimum (almost 0) with the L1 penalty.
These shrunken weights are not considered in the function to remove their relevance.
The mathematical equation of Lasso regression is shown in the above figure. Here, the value of
λ is the shrinkage parameter determining the shrinkage amount.
A higher value means more bias and less variance. For
λ = 0, all features are considered equivalent, and a value of infinity determines that no feature is considered.
The main concept of inducing the L1 penalty with the parameter is to decrease the weights of the feature when the parameter value increases.
Implement Lasso Regression in Python
We use the
sklearn.linear_model.Lasso class to implement Lasso regression in Python. We can create a model using this class and use it with the required train and test data to make the predictions.
It takes the parameter
alpha, the constant value that multiplies the L1 penalty. Other parameters like
copy_X, and more are also accepted in this object.
Let us implement an example of Lasso regression in Python.
from sklearn.linear_model import Lasso from sklearn.model_selection import train_test_split from sklearn import datasets data = datasets.load_boston() x_data = data.data y_data = data.target train_x, test_x, train_y, test_y = train_test_split(x_data, y_data, test_size=0.3, random_state=42) lasso = Lasso(alpha=1.0) lasso.fit(train_x, train_y) print(lasso.score(test_x, test_y)*100)
In the example above, we load a sample dataset from the sklearn module, and it is split into
y_data. We use the
train_test_split class to divide the dataset into train and test datasets.
We use the training dataset to train the Lasso regression model using the
fit() function. Then, we check the score of the predicted data using the
The output shows that this model predicted the data with 65.59% accuracy.
We can perform cross-validation to find the optimal value for the
alpha parameter of the Lasso model.
We can import the
sklearn.linear_model.LassoCV class. Then, we can specify the number of folds in the cross-validation technique.
See the code below.
from sklearn.linear_model import Lasso from sklearn.model_selection import train_test_split from sklearn import datasets data = datasets.load_boston() x_data = data.data y_data = data.target train_x, test_x, train_y, test_y = train_test_split(x_data, y_data, test_size=0.3, random_state=42) from sklearn.linear_model import LassoCV lassocv = LassoCV(cv=5, random_state=0, max_iter=10000) lassocv.fit(train_x, train_y) al = lassocv.alpha_ print(al) lasso = Lasso(alpha=al) lasso.fit(train_x, train_y) print(lasso.score(test_x, test_y)*100)
In the above example, we find the optimal value for the
alpha value, and then this value is used to train and test the previous model. The model’s accuracy with the new
alpha value is 67.48%.