>Business >How to produce Ridge Regression models in Python

How to produce Ridge Regression models in Python

Regression is a modelling activity that consists of forecasting a numeric value provided an input. 

Linear regression is the conventional algorithm for regression that goes by the assumption that a linear relationship amongst inputs and the target variable. An extension to the linear regression invokes including penalties to the loss function over the course of training that encourages simpler models that possess smaller coefficient values. These extensions are referenced to as regularized linear regression or penalized linear regression. 

Ridge Regression is a widespread, widely-leveraged variant of regularized linear regression that consists of an L2 penalty. This has the impact of shrinking the coefficients for those input variables that do not give much to the prediction activity. 

In this guide, you will find out how to develop and assess Ridge Regression models in Python. 

After going through this guide, you will be aware of: 

  • Ridge regression being an extension of linear regression that includes a regularization penalty to the loss function during training. 
  • How to assess a Ridge Regression model and leverage a final model to make forecasts for fresh data. 
  • How to configure the ridge regression model for a fresh dataset through grid search and automatically. 

Tutorial Summarization 

This tutorial is subdivided into three portions, which are: 

  1. Ridge Regression 
  2. Instance of Ridge Regression 
  3. Tuning Ridge Hyperparameters 

Ridge Regression 

Linear regression references to a model that goes by the assumption that a linear relationship amongst input variables and the target variable. 

With a singular input variable, the relationship is a line, and with higher dimensions, this relationship can be perceived of as a hyperparameter that connects the input variables to the target variable. The coefficients of the model are identified through an optimization procedure that looks to reduce the sum squared error amongst the predictions (yhat) and the expected target values. (y). 

Loss = sum i=0 to n (y_1 – yhat_i)^2 

An issue with linear regression is that estimated coefficients of the model can become big, making the model sensitive to inputs and potentially unstable. This is especially true for issues with few observations (samples) or less samples (n) than input predictors (p) or variables (so-called p >> n problems). 

One strategy to tackle the stability of regression model is to modify the loss function to integrate extra costs for a model that possesses big coefficients. Linear regression models that leverage these altered loss functions during training are referenced to collectively as a penalized linear regression. 

One widespread penalty is to penalize a model on the basis of the sum of the squared coefficient values (beta). This is referred to as an L2 penalty. 

  • l2_penalty = sum j=0 to p beta_j^2 

An L2 penalty reduces the size of all coefficients, even though it prevents any coefficients from being removed from the model by facilitating their value to become zero. 

The impact of this penalty is that the parameter estimates are just permitted to become large if there is a proportional reduction in SSE. In effect, this method shrinks the estimates towards 0 as the lambda penalty turns big (these strategies are at times referred to as “shrinkage methods”) 

This penalty can be included to the cost function for linear regression and is referenced to as Tikhonov regularization (after the author), or Ridge Regression in a more general sense. 

A hyperparameter is leveraged referred to as “lambda” that controls the weighting of the penalty to the loss function. A default value of 1.0 will fully weight the penalty; a value of zero excludes the penalty. Very minimal values of lambda, such as 1e-3 or smaller are typical. 

  • Ridge_loss = loss + (lambda * l2_penalty) 

Now that we are acquainted with Ridge penalized regression, let’s observe a worked instance. 

Instance of Ridge Regression 

In this portion of the blog, we will illustrate how to leverage the Ridge Regression algorithm. 

To start with, let’s put forth a standard regression dataset. We will leverage the housing dataset. 

The housing dataset is a conventional machine learning dataset consisting of 506 rows of data with thirteen numerical input variables and a numerical target variable. 

Leveraging a test harness of repeated stratified 10-fold cross-validation with three repeats, a naïve model can accomplish a mean absolute error (MAE) of approximately 6.6. A leading performing model can accomplish an MAE on this same test harness of approximately 1.9. This furnishes the bounds of expected performance on this dataset. 

The dataset consists of forecasting the house price provided details of the house’s suburb in the American City of Boston.  

The instance below downloads and loads the dataset as a Pandas DataFrame and summarizes the shape of the dataset and the first five rows of data. 

 

[Control] 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

# load and summarize the housing dataset 

from pandas import read_csv 

from matplotlib import pyplot 

# load dataset 

url = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv’ 

dataframe = read_csv(url, header=None) 

# summarize shape 

print(dataframe.shape) 

# summarize first few lines 

print(dataframe.head()) 

 

Running the instance confirms the 506 rows of data and 13 input variables and a singular numeric target variable (14 in total). We can also observe that all input variables are numeric. 

1 

2 

3 

4 

5 

6 

7 

8 

9 

(506, 14) 

        0     1     2   3      4      5   …  8      9     10      11    12    13 

0  0.00632  18.0  2.31   0  0.538  6.575  …   1  296.0  15.3  396.90  4.98  24.0 

1  0.02731   0.0  7.07   0  0.469  6.421  …   2  242.0  17.8  396.90  9.14  21.6 

2  0.02729   0.0  7.07   0  0.469  7.185  …   2  242.0  17.8  392.83  4.03  34.7 

3  0.03237   0.0  2.18   0  0.458  6.998  …   3  222.0  18.7  394.63  2.94  33.4 

4  0.06905   0.0  2.18   0  0.458  7.147  …   3  222.0  18.7  396.90  5.33  36.2 

 

[5 rows x 14 columns] 

 

The scikit-learn Python machine learning library furnishes an implementation of the Ridge Regression algorithm through the Ridge class. 

As a source of confusion, the lambda term can be configured through the “alpha” argument when defining the class. The default value is 1.0 or a full penalty.  

1 

2 

3 

 

# define model 

model = Ridge(alpha=1.0) 

 

We can assess the Ridge Regression model on the housing dataset leveraging repeated 10-fold cross-validation and report the average mean absolute error (MAE) on the dataset. 

# evaluate an ridge regression model on the dataset 

from numpy import mean 

from numpy import std 

from numpy import absolute 

from pandas import read_csv 

from sklearn.model_selection import cross_val_score 

from sklearn.model_selection import RepeatedKFold 

from sklearn.linear_model import Ridge 

# load the dataset 

url = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv’ 

dataframe = read_csv(url, header=None) 

data = dataframe.values 

X, y = data[:, :-1], data[:, -1] 

# define model 

model = Ridge(alpha=1.0) 

# define model evaluation method 

cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1) 

# evaluate model 

scores = cross_val_score(model, X, y, scoring=’neg_mean_absolute_error’, cv=cv, n_jobs=-1) 

# force scores to be positive 

scores = absolute(scores) 

print(‘Mean MAE: %.3f (%.3f)’ % (mean(scores), std(scores))) 

 

Running the instance assesses the Ridge Regression algorithm on the housing dataset and reports the average MAE across the 3 repeats of 10-fold cross-validation. 

Your particular outcomes might demonstrate variance provided the stochastic nature of the learning algorithm. Take up running the instance a few times. 

In this scenario, we can observe that the model accomplished an MAE of approximately 3.382. 

Mean MAE: 3.382 (0.519) 

We might make the decision to leverage the Ridge Regression as our final model and make forecasts on new information. 

This can be accomplished by fitting the model on all available data and calling the predict() function, passing in a fresh row of data. 

We can illustrate this with a full instance detailed below. 

# make a prediction with a ridge regression model on the dataset 

from pandas import read_csv 

from sklearn.linear_model import Ridge 

# load the dataset 

url = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv’ 

dataframe = read_csv(url, header=None) 

data = dataframe.values 

X, y = data[:, :-1], data[:, -1] 

# define model 

model = Ridge(alpha=1.0) 

# fit model 

model.fit(X, y) 

# define new data 

row = [0.00632,18.00,2.310,0,0.5380,6.5750,65.20,4.0900,1,296.0,15.30,396.90,4.98] 

# make a prediction 

yhat = model.predict([row]) 

# summarize prediction 

print(‘Predicted: %.3f’ % yhat) 

 

Running the instance fits the model and makes a forecast for the fresh rows of data. 

Predicted: 30.253 

Then, we can look at setting up the model hyperparameters. 

Tuning Ridge Hyperparameters 

How do we know that the default hyperparameters of alpha=1.0 is relevant for our dataset. 

We don’t. 

Rather, it is best practice to evaluate a suite of differing configurations and find out what functions best for our dataset. 

One strategy would be to grid search alpha values from probably 1e-5 to 100 on a log scale and find out what functions best for a dataset. Another strategy would be to evaluate values between 0.0 and 1.0 with a grid separation of 0.01. We will attempt the latter in this scenario. 

The instance below illustrates this leveraging the GridSearchCV class with a grid of values we have given definition to. 

 

[Control] 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

# grid search hyperparameters for ridge regression 

from numpy import arange 

from pandas import read_csv 

from sklearn.model_selection import GridSearchCV 

from sklearn.model_selection import RepeatedKFold 

from sklearn.linear_model import Ridge 

# load the dataset 

url = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv’ 

dataframe = read_csv(url, header=None) 

data = dataframe.values 

X, y = data[:, :-1], data[:, -1] 

# define model 

model = Ridge() 

# define model evaluation method 

cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1) 

# define grid 

grid = dict() 

grid[‘alpha’] = arange(0, 1, 0.01) 

# define search 

search = GridSearchCV(model, grid, scoring=’neg_mean_absolute_error’, cv=cv, n_jobs=-1) 

# perform the search 

results = search.fit(X, y) 

# summarize 

print(‘MAE: %.3f’ % results.best_score_) 

print(‘Config: %s’ % results.best_params_) 

 

Running the instance will assess every combo of configurations leveraging repeated cross-validation. 

Your particular outcomes might demonstrate variance provided the stochastic nature of the learning algorithm. Take up running the instance a few times. 

In this scenario, we can observe that we accomplished slightly improved outcomes than the default 3.379 v. 3.382. Ignore the sign, the library makes the MAE negative for optimization purposes.  

We can observe that the model allocated an alpha weight of 0.51 to the penalty. 

MAE: -3.379 

Config: {‘alpha’: 0.51} 

 

The scikit-learn library also furnishes a built-in variant of the algorithm that automatically identifies good hyperparameters through the RidgeCV class. 

To leverage this class, it is fitted on the training dataset and leveraged to make a forecast. During the training procedure, it automatically tunes the hyperparameter values. 

By default, the model will only evaluate the alpha values (0.1, 1.0, 10.0). We can modify this to a grid of values between 0 and 1 with a separation of 0.01 as we did on the prior instance by setting the “alphas” argument. 

The instance below illustrates this. 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

# use automatically configured the ridge regression algorithm 

from numpy import arange 

from pandas import read_csv 

from sklearn.linear_model import RidgeCV 

from sklearn.model_selection import RepeatedKFold 

# load the dataset 

url = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv’ 

dataframe = read_csv(url, header=None) 

data = dataframe.values 

X, y = data[:, :-1], data[:, -1] 

# define model evaluation method 

cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1) 

# define model 

model = RidgeCV(alphas=arange(0, 1, 0.01), cv=cv, scoring=’neg_mean_absolute_error’) 

# fit model 

model.fit(X, y) 

# summarize chosen configuration 

print(‘alpha: %f’ % model.alpha_) 

 

Running the instance fits the model and finds out the hyperparameters that provide the best outcomes leveraging cross-validation. 

Your particular results might demonstrate variance provided the stochastic nature of the learning algorithm. Try running the instance a few times.  

In this scenario, we can observe that the model selected the identical hyperparameter of alpha=0.51 that we identified through our manual grid search. 

alpha: 0.510000 

Further Reading 

This portion of the blog provides additional resources on the subject if you are seeking to delve deeper. 

Books 

The Elements of Statistical Learning, 2016 

Applied Predictive Modelling, 2013. 

APIs 

sklearn.linear_model.Ridge API 

sklearn.linear_model.RidgeCV API 

Linear Models, scikit-learn 

Articles 

Tikhonov regularization, Wikipedia 

Conclusion 

In this guide, you found out how to develop and assess Ridge Regression models in Python. 

Particularly, you learned: 

  • Ridge Regression being an extension of linear regression that includes a regularization penalty to the loss function during training. 
  • How to assess a Ridge Regression model and leverage a final model to make forecasts for fresh data. 
  • How to configure the Ridge Regression model for a fresh dataset through grid search and automatically.
Add Comment