How to produce Ridge Regression models in Python
Regression is a modelling activity that consists of forecasting a numeric value provided an input.
Linear regression is the conventional algorithm for regression that goes by the assumption that a linear relationship amongst inputs and the target variable. An extension to the linear regression invokes including penalties to the loss function over the course of training that encourages simpler models that possess smaller coefficient values. These extensions are referenced to as regularized linear regression or penalized linear regression.
Ridge Regression is a widespread, widely-leveraged variant of regularized linear regression that consists of an L2 penalty. This has the impact of shrinking the coefficients for those input variables that do not give much to the prediction activity.
In this guide, you will find out how to develop and assess Ridge Regression models in Python.
After going through this guide, you will be aware of:
- Ridge regression being an extension of linear regression that includes a regularization penalty to the loss function during training.
- How to assess a Ridge Regression model and leverage a final model to make forecasts for fresh data.
- How to configure the ridge regression model for a fresh dataset through grid search and automatically.
Tutorial Summarization
This tutorial is subdivided into three portions, which are:
- Ridge Regression
- Instance of Ridge Regression
- Tuning Ridge Hyperparameters
Ridge Regression
Linear regression references to a model that goes by the assumption that a linear relationship amongst input variables and the target variable.
With a singular input variable, the relationship is a line, and with higher dimensions, this relationship can be perceived of as a hyperparameter that connects the input variables to the target variable. The coefficients of the model are identified through an optimization procedure that looks to reduce the sum squared error amongst the predictions (yhat) and the expected target values. (y).
Loss = sum i=0 to n (y_1 – yhat_i)^2
An issue with linear regression is that estimated coefficients of the model can become big, making the model sensitive to inputs and potentially unstable. This is especially true for issues with few observations (samples) or less samples (n) than input predictors (p) or variables (so-called p >> n problems).
One strategy to tackle the stability of regression model is to modify the loss function to integrate extra costs for a model that possesses big coefficients. Linear regression models that leverage these altered loss functions during training are referenced to collectively as a penalized linear regression.
One widespread penalty is to penalize a model on the basis of the sum of the squared coefficient values (beta). This is referred to as an L2 penalty.
- l2_penalty = sum j=0 to p beta_j^2
An L2 penalty reduces the size of all coefficients, even though it prevents any coefficients from being removed from the model by facilitating their value to become zero.
The impact of this penalty is that the parameter estimates are just permitted to become large if there is a proportional reduction in SSE. In effect, this method shrinks the estimates towards 0 as the lambda penalty turns big (these strategies are at times referred to as “shrinkage methods”)
This penalty can be included to the cost function for linear regression and is referenced to as Tikhonov regularization (after the author), or Ridge Regression in a more general sense.
A hyperparameter is leveraged referred to as “lambda” that controls the weighting of the penalty to the loss function. A default value of 1.0 will fully weight the penalty; a value of zero excludes the penalty. Very minimal values of lambda, such as 1e-3 or smaller are typical.
- Ridge_loss = loss + (lambda * l2_penalty)
Now that we are acquainted with Ridge penalized regression, let’s observe a worked instance.
Instance of Ridge Regression
In this portion of the blog, we will illustrate how to leverage the Ridge Regression algorithm.
To start with, let’s put forth a standard regression dataset. We will leverage the housing dataset.
The housing dataset is a conventional machine learning dataset consisting of 506 rows of data with thirteen numerical input variables and a numerical target variable.
Leveraging a test harness of repeated stratified 10-fold cross-validation with three repeats, a naïve model can accomplish a mean absolute error (MAE) of approximately 6.6. A leading performing model can accomplish an MAE on this same test harness of approximately 1.9. This furnishes the bounds of expected performance on this dataset.
The dataset consists of forecasting the house price provided details of the house’s suburb in the American City of Boston.
The instance below downloads and loads the dataset as a Pandas DataFrame and summarizes the shape of the dataset and the first five rows of data.
[Control]
1 2 3 4 5 6 7 8 9 10 | # load and summarize the housing dataset from pandas import read_csv from matplotlib import pyplot # load dataset url = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv’ dataframe = read_csv(url, header=None) # summarize shape print(dataframe.shape) # summarize first few lines print(dataframe.head()) |
Running the instance confirms the 506 rows of data and 13 input variables and a singular numeric target variable (14 in total). We can also observe that all input variables are numeric.
1 2 3 4 5 6 7 8 9 | (506, 14) 0 1 2 3 4 5 … 8 9 10 11 12 13 0 0.00632 18.0 2.31 0 0.538 6.575 … 1 296.0 15.3 396.90 4.98 24.0 1 0.02731 0.0 7.07 0 0.469 6.421 … 2 242.0 17.8 396.90 9.14 21.6 2 0.02729 0.0 7.07 0 0.469 7.185 … 2 242.0 17.8 392.83 4.03 34.7 3 0.03237 0.0 2.18 0 0.458 6.998 … 3 222.0 18.7 394.63 2.94 33.4 4 0.06905 0.0 2.18 0 0.458 7.147 … 3 222.0 18.7 396.90 5.33 36.2
[5 rows x 14 columns] |
The scikit-learn Python machine learning library furnishes an implementation of the Ridge Regression algorithm through the Ridge class.
As a source of confusion, the lambda term can be configured through the “alpha” argument when defining the class. The default value is 1.0 or a full penalty.
1 2 3 | … # define model model = Ridge(alpha=1.0) |
We can assess the Ridge Regression model on the housing dataset leveraging repeated 10-fold cross-validation and report the average mean absolute error (MAE) on the dataset.
# evaluate an ridge regression model on the dataset
from numpy import mean
from numpy import std
from numpy import absolute
from pandas import read_csv
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedKFold
from sklearn.linear_model import Ridge
# load the dataset
url = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv’
dataframe = read_csv(url, header=None)
data = dataframe.values
X, y = data[:, :-1], data[:, -1]
# define model
model = Ridge(alpha=1.0)
# define model evaluation method
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
# evaluate model
scores = cross_val_score(model, X, y, scoring=’neg_mean_absolute_error’, cv=cv, n_jobs=-1)
# force scores to be positive
scores = absolute(scores)
print(‘Mean MAE: %.3f (%.3f)’ % (mean(scores), std(scores)))
Running the instance assesses the Ridge Regression algorithm on the housing dataset and reports the average MAE across the 3 repeats of 10-fold cross-validation.
Your particular outcomes might demonstrate variance provided the stochastic nature of the learning algorithm. Take up running the instance a few times.
In this scenario, we can observe that the model accomplished an MAE of approximately 3.382.
Mean MAE: 3.382 (0.519)
We might make the decision to leverage the Ridge Regression as our final model and make forecasts on new information.
This can be accomplished by fitting the model on all available data and calling the predict() function, passing in a fresh row of data.
We can illustrate this with a full instance detailed below.
# make a prediction with a ridge regression model on the dataset
from pandas import read_csv
from sklearn.linear_model import Ridge
# load the dataset
url = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv’
dataframe = read_csv(url, header=None)
data = dataframe.values
X, y = data[:, :-1], data[:, -1]
# define model
model = Ridge(alpha=1.0)
# fit model
model.fit(X, y)
# define new data
row = [0.00632,18.00,2.310,0,0.5380,6.5750,65.20,4.0900,1,296.0,15.30,396.90,4.98]
# make a prediction
yhat = model.predict([row])
# summarize prediction
print(‘Predicted: %.3f’ % yhat)
Running the instance fits the model and makes a forecast for the fresh rows of data.
Predicted: 30.253
Then, we can look at setting up the model hyperparameters.
Tuning Ridge Hyperparameters
How do we know that the default hyperparameters of alpha=1.0 is relevant for our dataset.
We don’t.
Rather, it is best practice to evaluate a suite of differing configurations and find out what functions best for our dataset.
One strategy would be to grid search alpha values from probably 1e-5 to 100 on a log scale and find out what functions best for a dataset. Another strategy would be to evaluate values between 0.0 and 1.0 with a grid separation of 0.01. We will attempt the latter in this scenario.
The instance below illustrates this leveraging the GridSearchCV class with a grid of values we have given definition to.
[Control]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | # grid search hyperparameters for ridge regression from numpy import arange from pandas import read_csv from sklearn.model_selection import GridSearchCV from sklearn.model_selection import RepeatedKFold from sklearn.linear_model import Ridge # load the dataset url = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv’ dataframe = read_csv(url, header=None) data = dataframe.values X, y = data[:, :-1], data[:, -1] # define model model = Ridge() # define model evaluation method cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1) # define grid grid = dict() grid[‘alpha’] = arange(0, 1, 0.01) # define search search = GridSearchCV(model, grid, scoring=’neg_mean_absolute_error’, cv=cv, n_jobs=-1) # perform the search results = search.fit(X, y) # summarize print(‘MAE: %.3f’ % results.best_score_) print(‘Config: %s’ % results.best_params_) |
Running the instance will assess every combo of configurations leveraging repeated cross-validation.
Your particular outcomes might demonstrate variance provided the stochastic nature of the learning algorithm. Take up running the instance a few times.
In this scenario, we can observe that we accomplished slightly improved outcomes than the default 3.379 v. 3.382. Ignore the sign, the library makes the MAE negative for optimization purposes.
We can observe that the model allocated an alpha weight of 0.51 to the penalty.
MAE: -3.379
Config: {‘alpha’: 0.51}
The scikit-learn library also furnishes a built-in variant of the algorithm that automatically identifies good hyperparameters through the RidgeCV class.
To leverage this class, it is fitted on the training dataset and leveraged to make a forecast. During the training procedure, it automatically tunes the hyperparameter values.
By default, the model will only evaluate the alpha values (0.1, 1.0, 10.0). We can modify this to a grid of values between 0 and 1 with a separation of 0.01 as we did on the prior instance by setting the “alphas” argument.
The instance below illustrates this.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | # use automatically configured the ridge regression algorithm from numpy import arange from pandas import read_csv from sklearn.linear_model import RidgeCV from sklearn.model_selection import RepeatedKFold # load the dataset url = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv’ dataframe = read_csv(url, header=None) data = dataframe.values X, y = data[:, :-1], data[:, -1] # define model evaluation method cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1) # define model model = RidgeCV(alphas=arange(0, 1, 0.01), cv=cv, scoring=’neg_mean_absolute_error’) # fit model model.fit(X, y) # summarize chosen configuration print(‘alpha: %f’ % model.alpha_) |
Running the instance fits the model and finds out the hyperparameters that provide the best outcomes leveraging cross-validation.
Your particular results might demonstrate variance provided the stochastic nature of the learning algorithm. Try running the instance a few times.
In this scenario, we can observe that the model selected the identical hyperparameter of alpha=0.51 that we identified through our manual grid search.
alpha: 0.510000
Further Reading
This portion of the blog provides additional resources on the subject if you are seeking to delve deeper.
Books
The Elements of Statistical Learning, 2016
Applied Predictive Modelling, 2013.
APIs
sklearn.linear_model.Ridge API
sklearn.linear_model.RidgeCV API
Linear Models, scikit-learn
Articles
Tikhonov regularization, Wikipedia
Conclusion
In this guide, you found out how to develop and assess Ridge Regression models in Python.
Particularly, you learned:
- Ridge Regression being an extension of linear regression that includes a regularization penalty to the loss function during training.
- How to assess a Ridge Regression model and leverage a final model to make forecasts for fresh data.
- How to configure the Ridge Regression model for a fresh dataset through grid search and automatically.