# Lasso Regression in R

The Lasso Regression is a classification algorithm that uses the sparse and shrinkage in simple models. This tutorial demonstrates how to perform lasso regression in R.

## Lasso Regression in R

The `LASSO`

is abbreviated as Least Absolute Shrinkage and Selection Operator. When we want to automate certain parts of model selection, the lasso regression is a good choice as it shows a high level of multicollinearity.

The lasso regression is a quadratic programming problem, and the languages like R and Matlab provide solutions. Let’s see step by step process to solve lasso regression in R.

### Understanding the Equation

The Lasso regression minimizes the following function.

```
RSS + λΣ|βj|
```

Where `j`

is the range from 1 to the predictor variable and the `λ ≥ 0`

, the second term `λΣ|βj|`

is known as shrinkage penalty.

The `RSS = Σ(Yi – ŷi)2`

, in which `Σ`

is the sum, `yi`

is the actual response value for `ith`

observation, and `ŷi`

is the predicted response value.

The lambda is selected as the lowest possible test mean squad error (MSE) in lasso regression once we know what we will do in the lasso regression.

Let’s load the data in the next step.

### Load the Data

Let’s use the `mtcars`

dataset for our example. The `hp`

will be used as the response variable and `mpg, drat, wt, qsec`

as the predictors.

We can use the `glmnet`

package to perform the lasso regression. Let’s load the data.

```
# glmnet package requires to define response variable
x <- mtcars$hp
# glmnet package requires to define matrix of predictor variables
y <- data.matrix(mtcars[, c('mpg', 'wt', 'drat', 'qsec')])
```

Once data is loaded, the next is to fit the lasso regression model.

### The Lasso Regression Model Fitting

In this step, we use the `glmnet()`

function to fit the lasso regression model; the alpha will be set to 1 for the lasso regression model. The k-fold cross-validation will be performed to determine the value of lambda, and with `glmnet`

, it automatically performs cross-validation with `k= 10`

folds.

We might need to install the `glmnet`

package if it is already not installed. See example:

```
# k-fold cross-validation to find lambda value
cv_model <- cv.glmnet(x, y, alpha = 1)
#lambda value that minimizes test MSE
best_lambda <- cv_model$lambda.min
best_lambda
# plot the model
plot(cv_model)
```

The code above fits the lasso regression model and shows the best lambda value and the model plot.

The lambda value which minimizes the test MSE is below.

```
[1] 2.01841
```

The lasso regression model plot:

### Analyze the Lasso Regression Model

Analyzing the model means we can show to coefficients of the model. The coefficients for our model are:

```
# Coefficients of the model
coef(cv_model)
```

Output for the code will be:

```
5 x 1 sparse Matrix of class "dgCMatrix"
s1
(Intercept) 418.277928
mpg -4.379633
wt .
drat .
qsec -10.286483
```

We know the best lambda value; we can also create the best model bypassing the best lambda value as the argument to the `glmnet`

function while fitting the model.

As we can see, no coefficients are shown for the predictors `wt`

and `drat`

because the lasso regression shrunk them to 0. That is why they were dropped from the model.