How to Generate LASSO Regression Models in Python
Regression is a modelling activity that consists of forecasting a numeric value provided an input.
Linear regression is the typical algorithm for regression that makes the assumption of a linear relationship amongst inputs and the target variable. An extension to linear regression consists of including penalties to the loss function in the course of training that encourages simpler models that possess lesser coefficient values. These extensions are referenced to as regularized linear regression or penalized linear regression.
Linear regression is a widely leverage variant of regularized linear regression that consists of an L1 penalty. This has the impact of shrinking the coefficients for those input variables that do not contribute a lot to the prediction activity. This penalty enables some coefficient values to go to the value of zero, enabling input variables to be effective removed from the framework, furnishing a variant of automatic feature selection.
In this guide, you will find out how to develop and evaluate Lasso Regression models in Python.
After going through this guide, you will be aware of:
- Lasso Regression being an extension of linear regression that includes a regularization penalty to the loss function in the course of training.
- How to assess a Lasso Regression model and leverage a final model to make forecasts for fresh data.
- How to configure the Lasso Regression model for a fresh dataset through grid search and automatically.
Tutorial Summarization
This guide is subdivided into three portions, which are,
- Lasso Regression
- Instance of Lasso Regression
- Tuning Lasso Hyperparameters
Lasso Regression
Linear regression is a reference to a model that operates by the assumption that a linear relationship amongst the input variable and the target variable.
With a singular input variable, this relationship is viewed as a line, and with higher dimensions, this relationship can be perceived of as a hyperplane that links the input variables to the target variable. The coefficients of the model are identified through an optimization procedure that looks to reduce the sum squared error amongst the predictions (yhat) and the predicted target values (y)
loss = sum i=0 to n (y_i – yhat_i)^2
An issue with linear regression is that estimates of coefficients of the model can become large, rendering sensitivity to the model with regards to inputs and potentially unstable. This is especially the case for issues with few observations (samples) or additional samples (n) than input predictors (p) or variables (so-called p>>n problems).
One strategy to tackle the stability of regression models is to modify the loss function to integrate extra costs for a model that has big coefficients. Linear regression models that leverage these modified loss functions in the course of training are referenced to collectively as penalized linear regression.
A widespread penalty is to penalize a model on the basis of the sum of the absolute coefficient values. This is referred to as the L1 penalty. An L1 penalty reduces the size of all coefficients and facilitates some coefficients to be reduced to the value zero, which eradicates the predictor from the model.
- l1_penatly = sum j=0 to p abs(beta_i)
An L1 penalty reduces the size of all coefficients and enables any coefficient to go to the value of zero, essentially eradicating input features from the model.
This functions as a variant of automatic feature selection.
A consequence of penalizing the absolute values is that a few parameters are actually set at 0 for some value of lambda. Therefore, the lasso yields models that simultaneously leverage regularization to improve the model and to carry out feature selection.
This penalty can be included to the cost function for linear regression and is referenced to as Least Absolute Shrinkage and Selection Operator regularization. (LASSO), or more typically, “Lasso” (with title case) for short.
A widespread alternative to ridge regression is the least absolute shrinkage and selection operator model, consistently referred to as the lasso.
A hyperparameter is leveraged referred to as “lambda” that controls the weighting of the penalty to the loss function. A default value of 1.0 will provide full weightings to the penalty, a value of 0 excludes the penalty. Very minimal values of lambda, like 1e-3 or smaller, are typical.
- lasso_loss = loss + (lambda * l1_penalty)
Now that we are acquainted with Lasso penalized regression, let’s look at a worked instance.
Instance of Lasso Regression
In this portion of the blog, we will illustrate how to leverage the Lasso Regression algorithm.
To start with, let’s introduce a conventional regression dataset. We will leverage the housing dataset.
The housing dataset is a conventional machine learning dataset consisting of 506 rows of data with 13 numerical input values and a numerical target variable.
Leveraging a test harness of repeated stratified 10-fold cross-validation with three repeats, a naïve model can accomplish a mean absolute error (MAE) of approximately 6.6. A leading performing model can accomplish a MAE on this same test harness of approximately 1.9. This furnishes the bounds of predicted performance on this dataset.
The dataset consists of forecasting the house price provided details of the house suburb in the North American city of Boston.
The instance here downloads and loads the dataset as a Pandas DataFrame and summarizes the shape of the dataset and the first five rows of data.
1 2 3 4 5 6 7 8 9 10 | # load and summarize the housing dataset from pandas import read_csv from matplotlib import pyplot # load dataset url = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv’ dataframe = read_csv(url, header=None) # summarize shape print(dataframe.shape) # summarize first few lines print(dataframe.head()) |
Executing the instance confirms the 506 rows of data and 13 input variables and a single numeric target variable (14 in total). We can additionally observe that all input variables are numeric.
[Control]
1 2 3 4 5 6 7 8 9 | (506, 14) 0 1 2 3 4 5 … 8 9 10 11 12 13 0 0.00632 18.0 2.31 0 0.538 6.575 … 1 296.0 15.3 396.90 4.98 24.0 1 0.02731 0.0 7.07 0 0.469 6.421 … 2 242.0 17.8 396.90 9.14 21.6 2 0.02729 0.0 7.07 0 0.469 7.185 … 2 242.0 17.8 392.83 4.03 34.7 3 0.03237 0.0 2.18 0 0.458 6.998 … 3 222.0 18.7 394.63 2.94 33.4 4 0.06905 0.0 2.18 0 0.458 7.147 … 3 222.0 18.7 396.90 5.33 36.2
[5 rows x 14 columns] |
The scikit-learn Python ML library furnishes an implementation of the Lasso penalized regression algorithm through the Lasso class.
Confusingly, the lambda term can be setup through the “alpha” argument when defining the class. The default value is 1.0 or a full penalty.
[Control]
1 2 3 | … # define model model = Lasso(alpha=1.0) |
We can assess the Lasso regression model on the housing dataset leveraging repeated 10-fold cross-validation and report the average mean absolute error (MAE) on the dataset.
[Control]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | # evaluate an lasso regression model on the dataset from numpy import mean from numpy import std from numpy import absolute from pandas import read_csv from sklearn.model_selection import cross_val_score from sklearn.model_selection import RepeatedKFold from sklearn.linear_model import Lasso # load the dataset url = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv’ dataframe = read_csv(url, header=None) data = dataframe.values X, y = data[:, :-1], data[:, -1] # define model model = Lasso(alpha=1.0) # define model evaluation method cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1) # evaluate model scores = cross_val_score(model, X, y, scoring=’neg_mean_absolute_error’, cv=cv, n_jobs=-1) # force scores to be positive scores = absolute(scores) print(‘Mean MAE: %.3f (%.3f)’ % (mean(scores), std(scores))) |
Running the instance assesses the Lasso Regression Algorithm on the housing dataset and reports the average MAE across the three repeats of 10-fold cross-validation.
Your particular outcomes might demonstrate variance provided the stochastic nature of the learning algorithm. Take up running the instance a few times.
In this scenario, we can observe that the model accomplished a MAE of approximately 3.711.
Mean MAE: 3.711 (0.549)
We might make the decision to leverage the Lasso Regression as our final model and make forecasts on fresh data.
This can be accomplished by fitment of the model on all available data and calling the predict() function, passing in a fresh row of data.
We can illustrate this with a complete instance, detailed below:
[Control]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | # make a prediction with a lasso regression model on the dataset from pandas import read_csv from sklearn.linear_model import Lasso # load the dataset url = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv’ dataframe = read_csv(url, header=None) data = dataframe.values X, y = data[:, :-1], data[:, -1] # define model model = Lasso(alpha=1.0) # fit model model.fit(X, y) # define new data row = [0.00632,18.00,2.310,0,0.5380,6.5750,65.20,4.0900,1,296.0,15.30,396.90,4.98] # make a prediction yhat = model.predict([row]) # summarize prediction print(‘Predicted: %.3f’ % yhat) |
Running the instance fits the model and makes a forecast for the fresh rows of data. Your particular results might demonstrate variance provided the stochastic nature of the learning algorithm. Take up running the instance a few times.
Predicted: 30.998
Then, we can look at configuring the model hyperparameters.
Tuning Lasso Hyperparameters
How are we aware that the default hyperparameters of alpha=1.0 is relevant for our dataset?
We do not.
Rather, it is best practice to evaluate a suite of differing configurations and find out what works ideally for our dataset.
One strategy would be to grid search alpha values from probably 1e-5 to 100 on a log-10 scale and find out what works ideally for a dataset. Another strategy would be to evaluate values between 0.0 and 1.0 with a grid separation of 0.01. We will attempt the latter in this scenario.
The instance below illustrates this leveraging the GridSearchCV class with a grid of values we have defined.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | # grid search hyperparameters for lasso regression from numpy import arange from pandas import read_csv from sklearn.model_selection import GridSearchCV from sklearn.model_selection import RepeatedKFold from sklearn.linear_model import Lasso # load the dataset url = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv’ dataframe = read_csv(url, header=None) data = dataframe.values X, y = data[:, :-1], data[:, -1] # define model model = Lasso() # define model evaluation method cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1) # define grid grid = dict() grid[‘alpha’] = arange(0, 1, 0.01) # define search search = GridSearchCV(model, grid, scoring=’neg_mean_absolute_error’, cv=cv, n_jobs=-1) # perform the search results = search.fit(X, y) # summarize print(‘MAE: %.3f’ % results.best_score_) print(‘Config: %s’ % results.best_params_) |
Running the instance will assess every combination of configurations leveraging repeated cross-validation.
Your particular outcomes might demonstrate variance provided the stochastic nature of the learning algorithm. Take up running the instance a few times.
You might observe some warnings that can be okay to ignore, like:
Objective did not converge. You might want to increase the number of iterations.
In this scenario, we can observe that we accomplished slightly improved outcomes than the default 3.379 v. 3.711. Ignore the sign, the library makes the MAE negative for optimization reasons.
We can observe that the model allocated an alpha weight of 0.01 to the penalty.
MAE: -3.379
Config: {‘alpha’: 0.01}
The scikit-learn library also furnishes a built-in version of the algorithm that automatically identifies good hyperparameters through the LassoCV class.
To leverage the class, the model is fitted on the training dataset as per normal and the hyperparameters are tuned automatically in the course of the training procedure. The fit model can then be leveraged to make a forecast.
By default, the model will evaluate 100 alpha values. We can modify this to a grid of values between 0 and 1 with a separation of 0.01 as we did on the prior example by setting the “aphas” argument.
The instance below illustrates this.
[Control]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | # use automatically configured the lasso regression algorithm from numpy import arange from pandas import read_csv from sklearn.linear_model import LassoCV from sklearn.model_selection import RepeatedKFold # load the dataset url = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv’ dataframe = read_csv(url, header=None) data = dataframe.values X, y = data[:, :-1], data[:, -1] # define model evaluation method cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1) # define model model = LassoCV(alphas=arange(0, 1, 0.01), cv=cv, n_jobs=-1) # fit model model.fit(X, y) # summarize chosen configuration print(‘alpha: %f’ % model.alpha_) |
Running the instance fits the model and finds out the hyperparameters that provide the best outcomes leveraging cross-validation.
Your particular results might demonstrate variance provided the stochastic nature of the learning algorithm. Take up running the instance a few times.
In this scenario, we can observe that the model selected the hyperparameter of alpha=0.0. This is differing from what we discovered through our manual grid search, probably owing to the methodical fashion in which configurations were searched or chosen.
alpha: 0.000000
Further Reading
This portion of the blog furnishes additional resources on the subject if you are seeking to delve deeper.
Books
The Elements of Statistical Learning, 2016
Applied Predictive Modelling, 2013.
APIs
Linear Models, scikit-learn
sklearn.linear_model.Lasso API.
sklearn.linear_model.LassoCV API.
Articles
Lasso (statistics), Wikipedia
Conclusion
In this guide, you found out how to generate and assess Lasso Regression models in Python.
Particularly, you learned:
- Lasso Regression being an extension of linear regression that includes a regularization penalty to the loss function in the course of training.
- How to assess a Lasso regression model and leverage a final model to make forecasts for fresh data.
- How to setup the Lasso Regression model for a fresh dataset through grid search and automatically.