>Business >How to Generate LASSO Regression Models in Python

How to Generate LASSO Regression Models in Python

Regression is a modelling activity that consists of forecasting a numeric value provided an input. 

Linear regression is the typical algorithm for regression that makes the assumption of a linear relationship amongst inputs and the target variable. An extension to linear regression consists of including penalties to the loss function in the course of training that encourages simpler models that possess lesser coefficient values. These extensions are referenced to as regularized linear regression or penalized linear regression. 

Linear regression is a widely leverage variant of regularized linear regression that consists of an L1 penalty. This has the impact of shrinking the coefficients for those input variables that do not contribute a lot to the prediction activity. This penalty enables some coefficient values to go to the value of zero, enabling input variables to be effective removed from the framework, furnishing a variant of automatic feature selection. 

In this guide, you will find out how to develop and evaluate Lasso Regression models in Python. 

After going through this guide, you will be aware of: 

  • Lasso Regression being an extension of linear regression that includes a regularization penalty to the loss function in the course of training. 
  • How to assess a Lasso Regression model and leverage a final model to make forecasts for fresh data.  
  • How to configure the Lasso Regression model for a fresh dataset through grid search and automatically. 

Tutorial Summarization 

This guide is subdivided into three portions, which are,  

  1. Lasso Regression 
  2.  Instance of Lasso Regression 
  3. Tuning Lasso Hyperparameters 

Lasso Regression 

Linear regression is a reference to a model that operates by the assumption that a linear relationship amongst the input variable and the target variable. 

With a singular input variable, this relationship is viewed as a line, and with higher dimensions, this relationship can be perceived of as a hyperplane that links the input variables to the target variable. The coefficients of the model are identified through an optimization procedure that looks to reduce the sum squared error amongst the predictions (yhat) and the predicted target values (y) 

loss = sum i=0 to n (y_i – yhat_i)^2 

An issue with linear regression is that estimates of coefficients of the model can become large, rendering sensitivity to the model with regards to inputs and potentially unstable. This is especially the case for issues with few observations (samples) or additional samples (n) than input predictors (p) or variables (so-called p>>n problems). 

One strategy to tackle the stability of regression models is to modify the loss function to integrate extra costs for a model that has big coefficients. Linear regression models that leverage these modified loss functions in the course of training are referenced to collectively as penalized linear regression. 

A widespread penalty is to penalize a model on the basis of the sum of the absolute coefficient values. This is referred to as the L1 penalty. An L1 penalty reduces the size of all coefficients and facilitates some coefficients to be reduced to the value zero, which eradicates the predictor from the model. 

  • l1_penatly = sum j=0 to p abs(beta_i) 

An L1 penalty reduces the size of all coefficients and enables any coefficient to go to the value of zero, essentially eradicating input features from the model. 

This functions as a variant of automatic feature selection. 

A consequence of penalizing the absolute values is that a few parameters are actually set at 0 for some value of lambda. Therefore, the lasso yields models that simultaneously leverage regularization to improve the model and to carry out feature selection. 

This penalty can be included to the cost function for linear regression and is referenced to as Least Absolute Shrinkage and Selection Operator regularization. (LASSO), or more typically, “Lasso” (with title case) for short. 

A widespread alternative to ridge regression is the least absolute shrinkage and selection operator model, consistently referred to as the lasso. 

A hyperparameter is leveraged referred to as “lambda” that controls the weighting of the penalty to the loss function. A default value of 1.0 will provide full weightings to the penalty, a value of 0 excludes the penalty. Very minimal values of lambda, like 1e-3 or smaller, are typical. 

  • lasso_loss = loss + (lambda * l1_penalty) 

Now that we are acquainted with Lasso penalized regression, let’s look at a worked instance.  

Instance of Lasso Regression 

In this portion of the blog, we will illustrate how to leverage the Lasso Regression algorithm. 

To start with, let’s introduce a conventional regression dataset. We will leverage the housing dataset. 

The housing dataset is a conventional machine learning dataset consisting of 506 rows of data with 13 numerical input values and a numerical target variable. 

Leveraging a test harness of repeated stratified 10-fold cross-validation with three repeats, a naïve model can accomplish a mean absolute error (MAE) of approximately 6.6. A leading performing model can accomplish a MAE on this same test harness of approximately 1.9. This furnishes the bounds of predicted performance on this dataset. 

The dataset consists of forecasting the house price provided details of the house suburb in the North American city of Boston. 

The instance here downloads and loads the dataset as a Pandas DataFrame and summarizes the shape of the dataset and the first five rows of data. 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

# load and summarize the housing dataset 

from pandas import read_csv 

from matplotlib import pyplot 

# load dataset 

url = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv’ 

dataframe = read_csv(url, header=None) 

# summarize shape 

print(dataframe.shape) 

# summarize first few lines 

print(dataframe.head()) 

 

Executing the instance confirms the 506 rows of data and 13 input variables and a single numeric target variable (14 in total). We can additionally observe that all input variables are numeric. 

 

[Control] 

1 

2 

3 

4 

5 

6 

7 

8 

9 

(506, 14) 

        0     1     2   3      4      5   …  8      9     10      11    12    13 

0  0.00632  18.0  2.31   0  0.538  6.575  …   1  296.0  15.3  396.90  4.98  24.0 

1  0.02731   0.0  7.07   0  0.469  6.421  …   2  242.0  17.8  396.90  9.14  21.6 

2  0.02729   0.0  7.07   0  0.469  7.185  …   2  242.0  17.8  392.83  4.03  34.7 

3  0.03237   0.0  2.18   0  0.458  6.998  …   3  222.0  18.7  394.63  2.94  33.4 

4  0.06905   0.0  2.18   0  0.458  7.147  …   3  222.0  18.7  396.90  5.33  36.2 

 

[5 rows x 14 columns] 

 

The scikit-learn Python ML library furnishes an implementation of the Lasso penalized regression algorithm through the Lasso class.  

Confusingly, the lambda term can be setup through the “alpha” argument when defining the class. The default value is 1.0 or a full penalty. 

 

[Control] 

1 

2 

3 

 

# define model 

model = Lasso(alpha=1.0) 

 

We can assess the Lasso regression model on the housing dataset leveraging repeated 10-fold cross-validation and report the average mean absolute error (MAE) on the dataset. 

 

[Control] 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

# evaluate an lasso regression model on the dataset 

from numpy import mean 

from numpy import std 

from numpy import absolute 

from pandas import read_csv 

from sklearn.model_selection import cross_val_score 

from sklearn.model_selection import RepeatedKFold 

from sklearn.linear_model import Lasso 

# load the dataset 

url = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv’ 

dataframe = read_csv(url, header=None) 

data = dataframe.values 

X, y = data[:, :-1], data[:, -1] 

# define model 

model = Lasso(alpha=1.0) 

# define model evaluation method 

cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1) 

# evaluate model 

scores = cross_val_score(model, X, y, scoring=’neg_mean_absolute_error’, cv=cv, n_jobs=-1) 

# force scores to be positive 

scores = absolute(scores) 

print(‘Mean MAE: %.3f (%.3f)’ % (mean(scores), std(scores))) 

 

Running the instance assesses the Lasso Regression Algorithm on the housing dataset and reports the average MAE across the three repeats of 10-fold cross-validation. 

Your particular outcomes might demonstrate variance provided the stochastic nature of the learning algorithm. Take up running the instance a few times. 

In this scenario, we can observe that the model accomplished a MAE of approximately 3.711. 

Mean MAE: 3.711 (0.549) 

We might make the decision to leverage the Lasso Regression as our final model and make forecasts on fresh data. 

This can be accomplished by fitment of the model on all available data and calling the predict() function, passing in a fresh row of data. 

We can illustrate this with a complete instance, detailed below: 

 

[Control] 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

# make a prediction with a lasso regression model on the dataset 

from pandas import read_csv 

from sklearn.linear_model import Lasso 

# load the dataset 

url = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv’ 

dataframe = read_csv(url, header=None) 

data = dataframe.values 

X, y = data[:, :-1], data[:, -1] 

# define model 

model = Lasso(alpha=1.0) 

# fit model 

model.fit(X, y) 

# define new data 

row = [0.00632,18.00,2.310,0,0.5380,6.5750,65.20,4.0900,1,296.0,15.30,396.90,4.98] 

# make a prediction 

yhat = model.predict([row]) 

# summarize prediction 

print(‘Predicted: %.3f’ % yhat) 

 

Running the instance fits the model and makes a forecast for the fresh rows of data. Your particular results might demonstrate variance provided the stochastic nature of the learning algorithm. Take up running the instance a few times. 

Predicted: 30.998 

Then, we can look at configuring the model hyperparameters. 

Tuning Lasso Hyperparameters 

How are we aware that the default hyperparameters of alpha=1.0 is relevant for our dataset? 

We do not. 

Rather, it is best practice to evaluate a suite of differing configurations and find out what works ideally for our dataset. 

One strategy would be to grid search alpha values from probably 1e-5 to 100 on a log-10 scale and find out what works ideally for a dataset. Another strategy would be to evaluate values between 0.0 and 1.0 with a grid separation of 0.01. We will attempt the latter in this scenario. 

The instance below illustrates this leveraging the GridSearchCV class with a grid of values we have defined. 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

# grid search hyperparameters for lasso regression 

from numpy import arange 

from pandas import read_csv 

from sklearn.model_selection import GridSearchCV 

from sklearn.model_selection import RepeatedKFold 

from sklearn.linear_model import Lasso 

# load the dataset 

url = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv’ 

dataframe = read_csv(url, header=None) 

data = dataframe.values 

X, y = data[:, :-1], data[:, -1] 

# define model 

model = Lasso() 

# define model evaluation method 

cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1) 

# define grid 

grid = dict() 

grid[‘alpha’] = arange(0, 1, 0.01) 

# define search 

search = GridSearchCV(model, grid, scoring=’neg_mean_absolute_error’, cv=cv, n_jobs=-1) 

# perform the search 

results = search.fit(X, y) 

# summarize 

print(‘MAE: %.3f’ % results.best_score_) 

print(‘Config: %s’ % results.best_params_) 

 

Running the instance will assess every combination of configurations leveraging repeated cross-validation. 

Your particular outcomes might demonstrate variance provided the stochastic nature of the learning algorithm. Take up running the instance a few times.  

You might observe some warnings that can be okay to ignore, like: 

Objective did not converge. You might want to increase the number of iterations. 

In this scenario, we can observe that we accomplished slightly improved outcomes than the default 3.379 v. 3.711. Ignore the sign, the library makes the MAE negative for optimization reasons. 

We can observe that the model allocated an alpha weight of 0.01 to the penalty. 

MAE: -3.379 

Config: {‘alpha’: 0.01} 

 

The scikit-learn library also furnishes a built-in version of the algorithm that automatically identifies good hyperparameters through the LassoCV class. 

To leverage the class, the model is fitted on the training dataset as per normal and the hyperparameters are tuned automatically in the course of the training procedure. The fit model can then be leveraged to make a forecast. 

By default, the model will evaluate 100 alpha values. We can modify this to a grid of values between 0 and 1 with a separation of 0.01 as we did on the prior example by setting the “aphas” argument. 

The instance below illustrates this. 

 

[Control] 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

# use automatically configured the lasso regression algorithm 

from numpy import arange 

from pandas import read_csv 

from sklearn.linear_model import LassoCV 

from sklearn.model_selection import RepeatedKFold 

# load the dataset 

url = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv’ 

dataframe = read_csv(url, header=None) 

data = dataframe.values 

X, y = data[:, :-1], data[:, -1] 

# define model evaluation method 

cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1) 

# define model 

model = LassoCV(alphas=arange(0, 1, 0.01), cv=cv, n_jobs=-1) 

# fit model 

model.fit(X, y) 

# summarize chosen configuration 

print(‘alpha: %f’ % model.alpha_) 

 

Running the instance fits the model and finds out the hyperparameters that provide the best outcomes leveraging cross-validation. 

Your particular results might demonstrate variance provided the stochastic nature of the learning algorithm. Take up running the instance a few times. 

In this scenario, we can observe that the model selected the hyperparameter of alpha=0.0. This is differing from what we discovered through our manual grid search, probably owing to the methodical fashion in which configurations were searched or chosen. 

alpha: 0.000000 

Further Reading 

This portion of the blog furnishes additional resources on the subject if you are seeking to delve deeper. 

Books 

The Elements of Statistical Learning, 2016 

Applied Predictive Modelling, 2013. 

APIs 

Linear Models, scikit-learn 

sklearn.linear_model.Lasso API. 

sklearn.linear_model.LassoCV API. 

Articles 

Lasso (statistics), Wikipedia 

Conclusion 

In this guide, you found out how to generate and assess Lasso Regression models in Python.  

Particularly, you learned: 

  • Lasso Regression being an extension of linear regression that includes a regularization penalty to the loss function in the course of training. 
  • How to assess a Lasso regression model and leverage a final model to make forecasts for fresh data. 
  • How to setup the Lasso Regression model for a fresh dataset through grid search and automatically. 
Add Comment