Autoencoder feature extraction for regression
Autoencoder is a variant of neural network which can be leveraged to go about learning a compressed representation of raw data.
An autoencoder is made up of encoder and a decoder sub-models. The encoder compresses the input and the decoder makes an effort to recreate the input from the variant that has undergone compression furnished by the encoder. Following training, the encoder model is saved and the decoder is done away with.
The encoder can then be leveraged as a data prep strategy to perform feature extraction on raw data which can be leveraged to train a different machine learning model.
In this guide, you will find out how to develop and assess an autoencoder for regression predictive:
After going through this guide, you will be aware of:
- An autoencoder is a neural network model that can be leveraged to learn a compressed representation of fresh data.
- How to train an autoencoder model on a training dataset and save only the encoder portion of the model.
- How to leverage the encoder as a data prep step when training an ML model.
Tutorial Summarization
This guide is subdivided into three portions, which are:
1] Autoencoders for Feature Extraction
2] Autoencoder for Regression
3] Autoencoder as Data prep
Autoencoders for Feature Extraction
An autoencoder is a neural network model that looks to go about learning a compressed representation of an input.
An autoencoder is a neural network that receives training to attempt to copy its input to its output.
They are an unsupervised learning strategy, even though technically, they receive training leveraging supervised training strategies, referenced to as self-supervised. They usually receive training as part of a wider model that makes an effort to recreate the input.
For instance,
X = model.predict(x)
The design of the autoencoder model, on purpose, renders this a challenge by limiting the architecture to a bottleneck at the midpoint of the model, from which the reconstruction of the input data is carried out.
There are several variants of autoencoders, and their usage varies, but probably the more typical usage is as a learned or automatic feature extraction model.
In this scenario, after the model is fitted, the reconstruction aspect of the model can be thrown out and the model up to the point of the bottleneck can be leveraged. The output of the model at the bottleneck is a static length vector that furnishes a compressed representation of the input data.
Typically they are limited in ways that enable them to copy only approximately, and to copy just input that resembles the training information. As the model is forced to prioritize which facets of the input should be replicated, it often goes about learning useful attributes of the information.
Input data from the domain can then be furnished to the model and the output of the model at the bottleneck can be leveraged as a feature vector within a supervised learning model, for visualization, or in a more general sense for dimensionality reduction.
Now, let’s look into how we could develop an autoencoder for feature extraction on a regression predictive modelling problem.
Autoencoder for Regression
In this portion of the blog, we will generate an autoencoder to learn a compressed representation of the input features for a regression predictive modelling issue.
To start with, let’s define a regression predictive modelling issue.
We will leverage the make_regression() scikit-learn function to give definition to a synthetic regression task with 100 input features (columns) and 1,000 instances (rows). Critically, we will define the issue is such a manner that a majority of the input variables are redundant (90 of the 100 or 90%), enabling the autoencoder later to learn a useful compressed representation.
The instance below defines the dataset and summarizes its shape.
[Control]
1 2 3 4 5 6 | # synthetic regression dataset from sklearn.datasets import make_regression # define dataset X, y = make_regression(n_samples=1000, n_features=100, n_informative=10, noise=0.1, random_state=1) # summarize the dataset print(X.shape, y.shape) |
Running the instance defines the dataset and prints the shape of the arrays, confirming the number of rows and columns.
(1000, 100) (1000,)
Then, we will generate a Multilayer Perceptron (MLP) autoencoder model.
The model will take all of the input columns, then output similar values. It will go about learning to recreate the input pattern precisely.
The autoencoder is made up of two portions: the encoder and the decoder. The encoder goes about learning how to interpret the input and go about compressing it to an internal representation defined by the bottleneck layer. The decoder takes the output from the encoder (the bottleneck layer) and makes an effort to recreate the inputs.
After the autoencoder receives training, the decode is thrown out and we only retain the encoder and leverage it to compress instances of input to vectors output by the bottleneck layer.
In this preliminary autoencoder, we will not compress the input in any way and will leverage a bottleneck layer the same size as the input. This ought to be a simple problem that the model will learn almost perfectly and is intended to confirm our model is implemented in the right way.
We will define the model leveraging the functional API.
Before defining and fitting the model, we will split the information into train and test sets and scale the input data through normalization of the values to the range 0-1, a decent practice with MLPs.
[Control]
1 2 3 4 5 6 7 8 | … # split into train test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1) # scale data t = MinMaxScaler() t.fit(X_train) X_train = t.transform(X_train) X_test = t.transform(X_test) |
We will define the encoder to possess a single hidden layer with the similar number of nodes as there are in the input data with batch normalization and ReLU activation.
This is followed by a bottleneck layer with the similar number of nodes as columns within the input data, for example, no compression.
[Control]
1 2 3 4 5 6 7 8 9 | … # define encoder visible = Input(shape=(n_inputs,)) e = Dense(n_inputs*2)(visible) e = BatchNormalization()(e) e = ReLU()(e) # define bottleneck n_bottleneck = n_inputs bottleneck = Dense(n_bottleneck)(e) |
The decoder will be defined with the same structure.
It will possess a single hidden layer with batch normalization and ReLU activation. The output layer will possess the identical number of nodes as there are columns in the input data and will leverage a linear activation function to output numeric values.
[Control]
1 2 3 4 5 6 7 8 9 10 11 | … # define decoder d = Dense(n_inputs*2)(bottleneck) d = BatchNormalization()(d) d = ReLU()(d) # output layer output = Dense(n_inputs, activation=’linear’)(d) # define autoencoder model model = Model(inputs=visible, outputs=output) # compile autoencoder model model.compile(optimizer=’adam’, loss=’mse’) |
The model will be fitted leveraging the effective Adam variant of stochastic gradient descent and reduces the mean squared error, provided that reconstruction is a variant of multi-output regression problem.
[Control]
1 2 3 | … # compile autoencoder model model.compile(optimizer=’adam’, loss=’mse’) |
We can plot the layers in the autoencoder model to obtain a feeling for how the information flows through the model.
…
# plot the autoencoder
plot_model(model, ‘autoencoder.png’, show_shapes=True)
The image here displays a plot of the autoencoder.
Then, we can train the model to recreate the input and maintain track of the performance of the model on the holdout evaluation set. The model receives training for 400 epochs and a batch size of 16 instances.
1 2 3 | … # fit the autoencoder model to reconstruct input history = model.fit(X_train, X_train, epochs=400, batch_size=16, verbose=2, validation_data=(X_test,X_test)) |
Upon training, we can plot the learning curves for the train and test sets to confirm the model has gone about learning the reconstruction problem well.
1 2 3 4 5 6 | … # plot loss pyplot.plot(history.history[‘loss’], label=’train’) pyplot.plot(history.history[‘val_loss’], label=’test’) pyplot.legend() pyplot.show() |
Lastly, we can save the encoder model for leveraging later on, if wanted.
1 2 3 4 5 6 | … # define an encoder model (without the decoder) encoder = Model(inputs=visible, outputs=bottleneck) plot_model(encoder, ‘encoder.png’, show_shapes=True) # save the encoder to file encoder.save(‘encoder.h5’) |
As part of saving the encoder, we will additionally plot the model to obtain a feeling for the shape of the output of the bottleneck layer, e.g. a 100-element vector.
An instance of this plot if furnished below.
Connecting this all together, the full instance of an autoencoder for reconstructing the input information for a regression dataset with no compression in the bottleneck layer is detailed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | # train autoencoder for regression with no compression in the bottleneck layer from sklearn.datasets import make_regression from sklearn.preprocessing import MinMaxScaler from sklearn.model_selection import train_test_split from tensorflow.keras.models import Model from tensorflow.keras.layers import Input from tensorflow.keras.layers import Dense from tensorflow.keras.layers import ReLU from tensorflow.keras.layers import BatchNormalization from tensorflow.keras.utils import plot_model from matplotlib import pyplot # define dataset X, y = make_regression(n_samples=1000, n_features=100, n_informative=10, noise=0.1, random_state=1) # number of input columns n_inputs = X.shape[1] # split into train test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1) # scale data t = MinMaxScaler() t.fit(X_train) X_train = t.transform(X_train) X_test = t.transform(X_test) # define encoder visible = Input(shape=(n_inputs,)) e = Dense(n_inputs*2)(visible) e = BatchNormalization()(e) e = ReLU()(e) # define bottleneck n_bottleneck = n_inputs bottleneck = Dense(n_bottleneck)(e) # define decoder d = Dense(n_inputs*2)(bottleneck) d = BatchNormalization()(d) d = ReLU()(d) # output layer output = Dense(n_inputs, activation=’linear’)(d) # define autoencoder model model = Model(inputs=visible, outputs=output) # compile autoencoder model model.compile(optimizer=’adam’, loss=’mse’) # plot the autoencoder plot_model(model, ‘autoencoder.png’, show_shapes=True) # fit the autoencoder model to reconstruct input history = model.fit(X_train, X_train, epochs=400, batch_size=16, verbose=2, validation_data=(X_test,X_test)) # plot loss pyplot.plot(history.history[‘loss’], label=’train’) pyplot.plot(history.history[‘val_loss’], label=’test’) pyplot.legend() pyplot.show() # define an encoder model (without the decoder) encoder = Model(inputs=visible, outputs=bottleneck) plot_model(encoder, ‘encoder.png’, show_shapes=True) # save the encoder to file encoder.save(‘encoder.h5’) |
Running the instance fits the model and reports loss on the train and evaluation sets along the way.
If you have issues developing the plots of the model, you can comment out the import and call the plot_model() function.
Your outcomes might demonstrate variance provided the stochastic nature of the algorithm or evaluation procedure, or variations in numerical accuracy. Take up running the instance a few times and contrast the average outcome.
In this scenario, we observe that loss gets low but does not get to zero (as we might have predicted) with no compression within the bottleneck layer. Probably further tuning the model architecture or learning hyperparameters is needed.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | … Epoch 393/400 42/42 – 0s – loss: 0.0025 – val_loss: 0.0024 Epoch 394/400 42/42 – 0s – loss: 0.0025 – val_loss: 0.0021 Epoch 395/400 42/42 – 0s – loss: 0.0023 – val_loss: 0.0021 Epoch 396/400 42/42 – 0s – loss: 0.0025 – val_loss: 0.0023 Epoch 397/400 42/42 – 0s – loss: 0.0024 – val_loss: 0.0022 Epoch 398/400 42/42 – 0s – loss: 0.0025 – val_loss: 0.0021 Epoch 399/400 42/42 – 0s – loss: 0.0026 – val_loss: 0.0022 Epoch 400/400 42/42 – 0s – loss: 0.0025 – val_loss: 0.0024 |
A plot of the learning curves is developed displaying that the model accomplishes a good fit in recreating the input, which holds steady throughout training, not overfitting.
So far, so good. We know how to generate an autoencoder without compression.
The encoder that has received training is saved to the file “encoder.h5” that we can load and leverage later on.
Then, let’s look into how we might leverage the trained encoder model.
Autoencoder as Data Prep
In this portion of the blog, we will leverage the trained encoder model from the autoencoder model to compress input information and train a differing predictive model.
To start with, let’s determine a baseline in performance on this issue. This is critical as if the performance of a model is not enhanced by the compressed encoding, then the compressed encoding does not inject value to the project and ought not to be leveraged.
We can go about training a support vector regression (SVR) model on the training dataset directly and assess their performance of the model on the holdout test set.
As is best practice, we will scale both input and target variables before fitting and assessing the model.
The full instance is detailed below.
[Control]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | # baseline in performance with support vector regression model from sklearn.datasets import make_regression from sklearn.preprocessing import MinMaxScaler from sklearn.model_selection import train_test_split from sklearn.svm import SVR from sklearn.metrics import mean_absolute_error # define dataset X, y = make_regression(n_samples=1000, n_features=100, n_informative=10, noise=0.1, random_state=1) # split into train test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1) # reshape target variables so that we can transform them y_train = y_train.reshape((len(y_train), 1)) y_test = y_test.reshape((len(y_test), 1)) # scale input data trans_in = MinMaxScaler() trans_in.fit(X_train) X_train = trans_in.transform(X_train) X_test = trans_in.transform(X_test) # scale output data trans_out = MinMaxScaler() trans_out.fit(y_train) y_train = trans_out.transform(y_train) y_test = trans_out.transform(y_test) # define model model = SVR() # fit model on the training dataset model.fit(X_train, y_train) # make prediction on test set yhat = model.predict(X_test) # invert transforms so we can calculate errors yhat = yhat.reshape((len(yhat), 1)) yhat = trans_out.inverse_transform(yhat) y_test = trans_out.inverse_transform(y_test) # calculate error score = mean_absolute_error(y_test, yhat) print(score) |
Running the instance fits an SVR model on the training dataset and evaluates it on the test set.
Your outcomes might demonstrate variance provided the stochastic nature of the algorithm or evaluation process, or variations in numerical accuracy. Take up running the instance a few times and contrast the average outcome.
In this scenario, we can observe that the model accomplishes a mean absolute error (MAE) of approximately 89.
The hope and expectation is that a SVR model fit on an encoded version of the input to accomplish reduced error for the encoding to be viewed as useful.
89.51082036130629
We can go about updating the instance to first encode the data leveraging the encoder model trained in the prior section.
To start with, we can go about loading the trained encoder model from the file.
[Control]
1 2 3 | … # load the model from file encoder = load_model(‘encoder.h5’) |
We can then leverage the encoder to transform the raw input data (for example, 100 columns) into bottleneck vectors (example, 100 element vectors)
This procedure can be applied to the train and test datasets.
[Control]
1 2 3 4 5 | … # encode the train data X_train_encode = encoder.predict(X_train) # encode the test data X_test_encode = encoder.predict(X_test) |
We can subsequently leverage this encoded data to train and evaluate the SVR model, as prior.
[Control]
1 2 3 4 5 6 7 | … # define model model = SVR() # fit model on the training dataset model.fit(X_train_encode, y_train) # make prediction on test set yhat = model.predict(X_test_encode) |
Connecting this together, the full instance is detailed below.
[Control]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | # support vector regression performance with encoded input from sklearn.datasets import make_regression from sklearn.preprocessing import MinMaxScaler from sklearn.model_selection import train_test_split from sklearn.svm import SVR from sklearn.metrics import mean_absolute_error from tensorflow.keras.models import load_model # define dataset X, y = make_regression(n_samples=1000, n_features=100, n_informative=10, noise=0.1, random_state=1) # split into train test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1) # reshape target variables so that we can transform them y_train = y_train.reshape((len(y_train), 1)) y_test = y_test.reshape((len(y_test), 1)) # scale input data trans_in = MinMaxScaler() trans_in.fit(X_train) X_train = trans_in.transform(X_train) X_test = trans_in.transform(X_test) # scale output data trans_out = MinMaxScaler() trans_out.fit(y_train) y_train = trans_out.transform(y_train) y_test = trans_out.transform(y_test) # load the model from file encoder = load_model(‘encoder.h5’) # encode the train data X_train_encode = encoder.predict(X_train) # encode the test data X_test_encode = encoder.predict(X_test) # define model model = SVR() # fit model on the training dataset model.fit(X_train_encode, y_train) # make prediction on test set yhat = model.predict(X_test_encode) # invert transforms so we can calculate errors yhat = yhat.reshape((len(yhat), 1)) yhat = trans_out.inverse_transform(yhat) y_test = trans_out.inverse_transform(y_test) # calculate error score = mean_absolute_error(y_test, yhat) print(score) |
Running the instance first encodes the dataset leveraging the encoder, then fits an SVR model on the training dataset and assesses it on the test set.
Your outcomes might demonstrate variance provided the stochastic nature of the algorithm or evaluation procedure, or variations in numerical accuracy. Take up running the instance a few times and contrast the average outcome.
In this scenario, we can observe that the model accomplishes a MAE of approximately 69.
This is an improved MAE than the identical model assessed on the raw dataset, indicating that the encoding is beneficial for our selected model and test harness.
69.45890939600503
Further Reading
This portion of the blog furnishes additional resources on the subject if you are seeking to delve deeper.
Books
Deep Learning, 2016
APIs
sklearn.datasets.make_regression API
sklearn.model_selection.train_test_split API
Articles
Autoencoder, Wikipedia
Conclusion
In this guide, you found out how to develop and assess an autoencoder for regression predictive modelling.
Particularly, you learned:
- An autoencoder is a neural network model that can be leveraged to learn a compressed representation of raw data.
- How to train an autoencoder model on a training dataset and save just the encoder portion of the model.
- How to leverage the encoder as a data prep step when training a machine learning model.