How to produce Elastic Net Regression Models in Python
Regression is a modelling activity that consists of forecasting a numeric value provided an input.
Linear regression is the traditional algorithm for regression that operates by the assumption that there exists a linear relationship amongst inputs and the target variable. An extension to linear regression consists of including penalties to the loss function in the course of training that encourage simpler models that have lesser coefficient values. These extensions are referenced to as regularized linear regression or penalized linear regression.
Elastic net is a widespread variant of regularized linear regression that brings together two widespread penalties, particularly the L1 and L2 penalty functions.
In this guide, you will find out how to develop Elastic net regularized regression in Python.
After going through this guide, you will be aware of:
- Elastic Net is an extension of linear regression that includes regularization penalties to the loss function in the course of training.
- How to assess an Elastic Net model and leverage a final model to make forecasts for fresh data.
- How to configure the Elastic Net model for a fresh dataset through grid search and automatically.
Tutorial Summarization
This tutorial is subdivided into three portions, which are:
1] Elastic Net Regression
2] Example of Elastic Net Regression
3] Tuning Elastic Net Hyperparameters
Elastic Net Regression
Linear regression references to a model that goes by the assumption that a linear relationship exists amongst input variables and the target variable.
With a singular input variable, this relationship takes on the form of a line, and with higher dimensions, this relationship can be perceived of as a hyperplane that links the input variables to the target variable. The coefficients of the model are identified through an optimization procedure that looks to reduce the sum squared error amongst the predictions (yhat) and the expected target values. (y).
- loss = sum i=0 to n (y_1 – yhat_i)^2
An issue with linear regression is that estimated coefficients of the model can become big, making the model sensitive to inputs and potentially unstable. This is especially true for problems with minimal observations (samples) or more samples (n) than input predictors (p) or variables (so-called p >> n problems).
One strategy to tackling the stability of regression models is to modify the loss function to integrate additional costs for a model that has big coefficients. Linear regression models that leverage these modified loss functions in the course of training are referenced to collectively as penalized linear regression.
One widespread penalty is to penalize a model on the basis of the sum of the squared coefficient values. This is referred to as an L2 penalty. An L2 penalty minimizes the size of all coefficients, even though it averts any coefficients from being removed from the model.
- l2_penalty = sum j=0 to p beta_j^2
Another widespread penalty is to penalize a model on the basis of the sum of the absolute coefficient values. This is referred to as the L1 penalty. An L1 penalty reduces the size of all coefficients and enables a few coefficients to be minimized to the value zero, which removes the predictor from the model.
- l1_penalty = sum j=0 to p abs(beta_j)
Elastic net is a penalized linear regression model that consists of both the L1 and L2 penalties in the course of training.
Leveraging the terminology from “The Elements of Statistical Learning” a hyperparameter “alpha” is furnished to allocate how much weight is provided to each of the L1 and L2 penalties. Alpha is a value between 0 and 1 and is leveraged to weight the contribution of the L1 penalty and one minus the alpha value is leveraged to weight the L2 penalty.
- elastic_net_penalty = (alpha * l1_penalty) + ((1 – alpha) * l2_penalty)
For instance, an alpha of 0.5 would furnish a 50% contribution of every penalty to the loss function. An alpha value of 0 provides all weight to the L2 penalty and a value of 1 provides all weight to the L1 penalty.
The parameter alpha decides the mixture of the penalties, and is typically pre-selected on qualitative grounds.
The advantage is that elastic net enables a balance of both penalties, which can have the outcome of improved performance than a model with either one or the other penalty on some problems.
Another hyperparameter is furnished referred to as “lambda” that controls the weighting of the sum of both penalties to the loss function. A default value of 1.0 is leveraged to use the fully weighted penalty, a value of 0 excludes the penalty. Very small values of lambada, like 1e-3 or smaller, are typical.
- Elastic_net_penalty = (alpha * l1_penalty) + ((1 – alpha) * l2_penalty)
For instance, an alpha of 0.5 would furnish a 50% contribution of every penalty to the loss function. An alpha value of 0 provides all weight to the L2 penalty and a value of 1 provides all weight to the L1 penalty.
The parameter alpha decides the mixture of the penalties, and is typically pre-selected on qualitative grounds.
The advantage is that elastic net facilitates a balance of both penalties, which can have the outcome of improved performance than a model with either one or the penalty on some problems.
Another hyperparameter is furnished referred to as “lambda” that controls the weighting of the sum of both penalties to the loss function. A default value of 1.0 is leveraged to use the fully weighted penalty, a value of 0 excludes the penalty. Very minimal values of lambada, like 1e-3 or lesser, are typical.
- elastic_net_loss = loss + (lambda * elastic_net_penalty)
Now that we are acquainted with elastic net penalized regression, let’s look at a worked instance.
Example of Elastic Net Regression
In this portion of the blog, we will illustrate how to leverage the Elastic Net regression algorithm.
To start with, let’s put forth a conventional regression dataset. We will leverage the housing dataset.
The housing dataset is a traditional machine learning dataset consisting of 506 rows of data with 13 numerical input variables and a numerical target variable.
Leveraging a test harness of repeated stratified 10-fold cross-validation with three repeats, a naïve model can accomplish a mean absolute error (MAE) of about 6.6. A leading performing model can accomplish an MAE on this same test harness of approximately 1.9.This furnishes the bounds of predicted performance on this dataset.
The dataset consists of forecasting the house pricing provided info of the house’s suburb in the American city of Boston.
The instance below leverages the dataset. It downloads and loads the dataset as a Pandas DataFrame and summarizes the shape of the dataset and the first five rows of information.
1 2 3 4 5 6 7 8 9 10 | # load and summarize the housing dataset from pandas import read_csv from matplotlib import pyplot # load dataset url = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv’ dataframe = read_csv(url, header=None) # summarize shape print(dataframe.shape) # summarize first few lines print(dataframe.head()) |
Running the instance confirms the 506 rows of data and 13 input variables and a singular numeric target variable (14 cumulatively).
We can also observe that all input variables are numeric.
1 2 3 4 5 6 7 8 9 | (506, 14) 0 1 2 3 4 5 … 8 9 10 11 12 13 0 0.00632 18.0 2.31 0 0.538 6.575 … 1 296.0 15.3 396.90 4.98 24.0 1 0.02731 0.0 7.07 0 0.469 6.421 … 2 242.0 17.8 396.90 9.14 21.6 2 0.02729 0.0 7.07 0 0.469 7.185 … 2 242.0 17.8 392.83 4.03 34.7 3 0.03237 0.0 2.18 0 0.458 6.998 … 3 222.0 18.7 394.63 2.94 33.4 4 0.06905 0.0 2.18 0 0.458 7.147 … 3 222.0 18.7 396.90 5.33 36.2
[5 rows x 14 columns] |
The scikit-learn Python machine learning library furnishes an implementation of the Elastic net penalized regression algorithm through the ElasticNet class.
Rather confusingly, the alpha hyperparameter can be established through the “l1_ratio” argument that controls the contribution of the L1 and L2 penalties and the lambda hyperparameter can be set through the “alpha” argument that controls the contribution of the sum of both penalties to the loss function.
By default, an equal balance of 0.5 is leveraged for “l1_ration” and a full weighting of 1.0 is leveraged for alpha.
[Control]
1 2 3 | … # define model model = ElasticNet(alpha=1.0, l1_ratio=0.5) |
We can assess the Elastic Net Model on the housing dataset leveraging repeated 10-fold cross-validation and report the average mean absolute error (MAE) on the dataset.
[Control]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | # evaluate an elastic net model on the dataset from numpy import mean from numpy import std from numpy import absolute from pandas import read_csv from sklearn.model_selection import cross_val_score from sklearn.model_selection import RepeatedKFold from sklearn.linear_model import ElasticNet # load the dataset url = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv’ dataframe = read_csv(url, header=None) data = dataframe.values X, y = data[:, :-1], data[:, -1] # define model model = ElasticNet(alpha=1.0, l1_ratio=0.5) # define model evaluation method cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1) # evaluate model scores = cross_val_score(model, X, y, scoring=’neg_mean_absolute_error’, cv=cv, n_jobs=-1) # force scores to be positive scores = absolute(scores) print(‘Mean MAE: %.3f (%.3f)’ % (mean(scores), std(scores))) |
Running the instance evaluates the Elastic Net algorithm on the housing dataset and reports the average MAE throughout the three repeats of 10-fold cross-validation.
Your particular results might demonstrate variance provided the stochastic nature of the learning algorithm. Take up running the instance a few times.
In this scenario, we can observe that the model accomplished a MAE of approximately 3.682.
Mean MAE: 3.682 (0.530)
We might make the decision to leverage the Elastic Net as our final model and make forecasts on fresh data.
This can be accomplished by fitting the model on all available data and calling the predict() function, passing in a new row of data.
We can illustrate this with a complete instance, detailed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | # make a prediction with an elastic net model on the dataset from pandas import read_csv from sklearn.linear_model import ElasticNet # load the dataset url = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv’ dataframe = read_csv(url, header=None) data = dataframe.values X, y = data[:, :-1], data[:, -1] # define model model = ElasticNet(alpha=1.0, l1_ratio=0.5) # fit model model.fit(X, y) # define new data row = [0.00632,18.00,2.310,0,0.5380,6.5750,65.20,4.0900,1,296.0,15.30,396.90,4.98] # make a prediction yhat = model.predict([row]) # summarize prediction print(‘Predicted: %.3f’ % yhat) |
Running the instance fits the model and makes a forecast for the new rows of data.
Predicted: 31.047
Then, we can look at configuring the model hyperparameters.
Tuning Elastic Net Hyperparameters
How are we aware that the default hyperparameters of alpha=1.0 and l1_ratio=0.5 are any good for our dataset?
We don’t.
Rather, it is best practice to evaluate a suite of differing configurations and find out what functions best.
One strategy would be to grid search l1_ratio values between 0 and 1 with 0.1 or 0.01 separation and alpha values from probably 1e-5 to 100 on a log-10 scale and find out works ideally for a dataset.
The instance below illustrates this leveraging the GridSearchCV class with a grid of values we have given definition to.
[Control]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | # grid search hyperparameters for the elastic net from numpy import arange from pandas import read_csv from sklearn.model_selection import GridSearchCV from sklearn.model_selection import RepeatedKFold from sklearn.linear_model import ElasticNet # load the dataset url = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv’ dataframe = read_csv(url, header=None) data = dataframe.values X, y = data[:, :-1], data[:, -1] # define model model = ElasticNet() # define model evaluation method cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1) # define grid grid = dict() grid[‘alpha’] = [1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 0.0, 1.0, 10.0, 100.0] grid[‘l1_ratio’] = arange(0, 1, 0.01) # define search search = GridSearchCV(model, grid, scoring=’neg_mean_absolute_error’, cv=cv, n_jobs=-1) # perform the search results = search.fit(X, y) # summarize print(‘MAE: %.3f’ % results.best_score_) print(‘Config: %s’ % results.best_params_) |
Running the instance will evaluate every combination of configurations leveraging repeated cross-validation.
You might observe some warnings that can be safely ignored, like:
Objective did not converge. You might want to increase the number of iterations.
Your particular outcomes might demonstrate variance provided the stochastic nature of the algorithm. Take up running the instance a few times.
In this scenario, we can observe that we accomplished marginally better results than the default 3.378 vs. 3.682. Overlook the sign, the library makes the MAE negative for optimization reasons.
We can observe that the model allocated an alpha weight of 0.01 to the penalty and concentrates exclusively on the L2 penalty.
MAE: -3.378
Config: {‘alpha’: 0.01, ‘l1_ratio’: 0.97}
The scikit-learn library also furnishes a built-in variant of the algorithm that automatically identifies good hyperparameters through the ElasticNetCV class.
To leverage this class, it is initially fitted on the dataset, then leveraged to make a forecast. It will automatically identify relevant hyperparameters.
By default, the model will evaluate 100 alpha values and leverage a default ratio. We can mention our own listings of values to evaluate through the “l1_ratio” and “alphas” arguments, as we performed with the manual grid search.
The instance below illustrates this.
# use automatically configured elastic net algorithm
from numpy import arange
from pandas import read_csv
from sklearn.linear_model import ElasticNetCV
from sklearn.model_selection import RepeatedKFold
# load the dataset
url = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv’
dataframe = read_csv(url, header=None)
data = dataframe.values
X, y = data[:, :-1], data[:, -1]
# define model evaluation method
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
# define model
ratios = arange(0, 1, 0.01)
alphas = [1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 0.0, 1.0, 10.0, 100.0]
model = ElasticNetCV(l1_ratio=ratios, alphas=alphas, cv=cv, n_jobs=-1)
# fit model
model.fit(X, y)
# summarize chosen configuration
print(‘alpha: %f’ % model.alpha_)
print(‘l1_ratio_: %f’ % model.l1_ratio_)
Your particular outcomes may demonstrate variance provided the stochastic nature of the learning algorithm. Try executing the instance a few times.
Again, you might observe some warnings that can be safely overlooked, like
Objective did not converge. You might want to increase the number of iterations.
In this scenario, we can observe that an alpha of 0.0 was selected, removing both penalties from the loss function.
This is differing from what we identified through our manual grid search, probably owing to the systematic fashion in which configurations were searched or chosen.
alpha: 0.000000
l1_ratio_: 0.470000
Further Reading
This section furnishes additional resources on the subject if you are seeking to delve deeper.
Books
- The elements of statistical learning, 2016
- Applied predictive modelling, 2013
APIs
- sklearn.linear_model.ElasticNet API
- sklearn.linear_model.ElasticNetCV API.
Articles
- Elastic net regularization, Wikipedia
Conclusion
In this guide, you found out how to develop Elastic net regularized regression in Python.
Particularly, you learned:
- Elastic net is an extension of linear regression that includes regularization penalties to the loss function during the course of training.
- How to assess an elastic net model and leverage a final model to make forecasts for new data.
- How to setup the Elastic Net model for a fresh dataset through grid search and automatically.