>Business >Autoencoder feature extraction for regression

Autoencoder feature extraction for regression

Autoencoder is a variant of neural network which can be leveraged to go about learning a compressed representation of raw data. 

An autoencoder is made up of encoder and a decoder sub-models. The encoder compresses the input and the decoder makes an effort to recreate the input from the variant that has undergone compression furnished by the encoder. Following training, the encoder model is saved and the decoder is done away with. 

The encoder can then be leveraged as a data prep strategy to perform feature extraction on raw data which can be leveraged to train a different machine learning model. 

In this guide, you will find out how to develop and assess an autoencoder for regression predictive: 

After going through this guide, you will be aware of: 

  • An autoencoder is a neural network model that can be leveraged to learn a compressed representation of fresh data. 
  • How to train an autoencoder model on a training dataset and save only the encoder portion of the model. 
  • How to leverage the encoder as a data prep step when training an ML model. 

Tutorial Summarization 

This guide is subdivided into three portions, which are: 

1] Autoencoders for Feature Extraction 

2] Autoencoder for Regression 

3] Autoencoder as Data prep 

Autoencoders for Feature Extraction 

An autoencoder is a neural network model that looks to go about learning a compressed representation of an input. 

An autoencoder is a neural network that receives training to attempt to copy its input to its output. 

They are an unsupervised learning strategy, even though technically, they receive training leveraging supervised training strategies, referenced to as self-supervised. They usually receive training as part of a wider model that makes an effort to recreate the input. 

For instance, 

X = model.predict(x) 

The design of the autoencoder model, on purpose, renders this a challenge by limiting the architecture to a bottleneck at the midpoint of the model, from which the reconstruction of the input data is carried out. 

There are several variants of autoencoders, and their usage varies, but probably the more typical usage is as a learned or automatic feature extraction model. 

In this scenario, after the model is fitted, the reconstruction aspect of the model can be thrown out and the model up to the point of the bottleneck can be leveraged. The output of the model at the bottleneck is a static length vector that furnishes a compressed representation of the input data. 

Typically they are limited in ways that enable them to copy only approximately, and to copy just input that resembles the training information. As the model is forced to prioritize which facets of the input should be replicated, it often goes about learning useful attributes of the information. 

Input data from the domain can then be furnished to the model and the output of the model at the bottleneck can be leveraged as a feature vector within a supervised learning model, for visualization, or in a more general sense for dimensionality reduction. 

Now, let’s look into how we could develop an autoencoder for feature extraction on a regression predictive modelling problem. 

Autoencoder for Regression 

In this portion of the blog, we will generate an autoencoder to learn a compressed representation of the input features for a regression predictive modelling issue. 

To start with, let’s define a regression predictive modelling issue. 

We will leverage the make_regression() scikit-learn function to give definition to a synthetic regression task with 100 input features (columns) and 1,000 instances (rows). Critically, we will define the issue is such a manner that a majority of the input variables are redundant (90 of the 100 or 90%), enabling the autoencoder later to learn a useful compressed representation. 

The instance below defines the dataset and summarizes its shape. 

 

[Control] 

1 

2 

3 

4 

5 

6 

# synthetic regression dataset 

from sklearn.datasets import make_regression 

# define dataset 

X, y = make_regression(n_samples=1000, n_features=100, n_informative=10, noise=0.1, random_state=1) 

# summarize the dataset 

print(X.shape, y.shape) 

 

Running the instance defines the dataset and prints the shape of the arrays, confirming the number of rows and columns. 

(1000, 100) (1000,) 

Then, we will generate a Multilayer Perceptron (MLP) autoencoder model. 

The model will take all of the input columns, then output similar values. It will go about learning to recreate the input pattern precisely. 

The autoencoder is made up of two portions: the encoder and the decoder. The encoder goes about learning how to interpret the input and go about compressing it to an internal representation defined by the bottleneck layer. The decoder takes the output from the encoder (the bottleneck layer) and makes an effort to recreate the inputs. 

After the autoencoder receives training, the decode is thrown out and we only retain the encoder and leverage it to compress instances of input to vectors output by the bottleneck layer. 

In this preliminary autoencoder, we will not compress the input in any way and will leverage a bottleneck layer the same size as the input. This ought to be a simple problem that the model will learn almost perfectly and is intended to confirm our model is implemented in the right way. 

We will define the model leveraging the functional API.  

Before defining and fitting the model, we will split the information into train and test sets and scale the input data through normalization of the values to the range 0-1, a decent practice with MLPs. 

 

[Control] 

1 

2 

3 

4 

5 

6 

7 

8 

 

# split into train test sets 

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1) 

# scale data 

t = MinMaxScaler() 

t.fit(X_train) 

X_train = t.transform(X_train) 

X_test = t.transform(X_test) 

 

We will define the encoder to possess a single hidden layer with the similar number of nodes as there are in the input data with batch normalization and ReLU activation. 

This is followed by a bottleneck layer with the similar number of nodes as columns within the input data, for example, no compression. 

 

[Control] 

1 

2 

3 

4 

5 

6 

7 

8 

9 

 

# define encoder 

visible = Input(shape=(n_inputs,)) 

e = Dense(n_inputs*2)(visible) 

e = BatchNormalization()(e) 

e = ReLU()(e) 

# define bottleneck 

n_bottleneck = n_inputs 

bottleneck = Dense(n_bottleneck)(e) 

 

The decoder will be defined with the same structure. 

It will possess a single hidden layer with batch normalization and ReLU activation. The output layer will possess the identical number of nodes as there are columns in the input data and will leverage a linear activation function to output numeric values. 

 

[Control] 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

 

# define decoder 

d = Dense(n_inputs*2)(bottleneck) 

d = BatchNormalization()(d) 

d = ReLU()(d) 

# output layer 

output = Dense(n_inputs, activation=’linear’)(d) 

# define autoencoder model 

model = Model(inputs=visible, outputs=output) 

# compile autoencoder model 

model.compile(optimizer=’adam’, loss=’mse’) 

 

The model will be fitted leveraging the effective Adam variant of stochastic gradient descent and reduces the mean squared error, provided that reconstruction is a variant of multi-output regression problem. 

 

[Control] 

1 

2 

3 

 

# compile autoencoder model 

model.compile(optimizer=’adam’, loss=’mse’) 

 

We can plot the layers in the autoencoder model to obtain a feeling for how the information flows through the model. 

 

# plot the autoencoder 

plot_model(model, ‘autoencoder.png’, show_shapes=True) 

 

The image here displays a plot of the autoencoder. 

Then, we can train the model to recreate the input and maintain track of the performance of the model on the holdout evaluation set. The model receives training for 400 epochs and a batch size of 16 instances.

 

1

2

3

# fit the autoencoder model to reconstruct input

history = model.fit(X_train, X_train, epochs=400, batch_size=16, verbose=2, validation_data=(X_test,X_test))

 

Upon training, we can plot the learning curves for the train and test sets to confirm the model has gone about learning the reconstruction problem well.

1

2

3

4

5

6

# plot loss

pyplot.plot(history.history[‘loss’], label=’train’)

pyplot.plot(history.history[‘val_loss’], label=’test’)

pyplot.legend()

pyplot.show()

 

Lastly, we can save the encoder model for leveraging later on, if wanted.

 

1

2

3

4

5

6

# define an encoder model (without the decoder)

encoder = Model(inputs=visible, outputs=bottleneck)

plot_model(encoder, ‘encoder.png’, show_shapes=True)

# save the encoder to file

encoder.save(‘encoder.h5’)

 

As part of saving the encoder, we will additionally plot the model to obtain a feeling for the shape of the output of the bottleneck layer, e.g. a 100-element vector.

An instance of this plot if furnished below.

Connecting this all together, the full instance of an autoencoder for reconstructing the input information for a regression dataset with no compression in the bottleneck layer is detailed below.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

# train autoencoder for regression with no compression in the bottleneck layer

from sklearn.datasets import make_regression

from sklearn.preprocessing import MinMaxScaler

from sklearn.model_selection import train_test_split

from tensorflow.keras.models import Model

from tensorflow.keras.layers import Input

from tensorflow.keras.layers import Dense

from tensorflow.keras.layers import ReLU

from tensorflow.keras.layers import BatchNormalization

from tensorflow.keras.utils import plot_model

from matplotlib import pyplot

# define dataset

X, y = make_regression(n_samples=1000, n_features=100, n_informative=10, noise=0.1, random_state=1)

# number of input columns

n_inputs = X.shape[1]

# split into train test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)

# scale data

t = MinMaxScaler()

t.fit(X_train)

X_train = t.transform(X_train)

X_test = t.transform(X_test)

# define encoder

visible = Input(shape=(n_inputs,))

e = Dense(n_inputs*2)(visible)

e = BatchNormalization()(e)

e = ReLU()(e)

# define bottleneck

n_bottleneck = n_inputs

bottleneck = Dense(n_bottleneck)(e)

# define decoder

d = Dense(n_inputs*2)(bottleneck)

d = BatchNormalization()(d)

d = ReLU()(d)

# output layer

output = Dense(n_inputs, activation=’linear’)(d)

# define autoencoder model

model = Model(inputs=visible, outputs=output)

# compile autoencoder model

model.compile(optimizer=’adam’, loss=’mse’)

# plot the autoencoder

plot_model(model, ‘autoencoder.png’, show_shapes=True)

# fit the autoencoder model to reconstruct input

history = model.fit(X_train, X_train, epochs=400, batch_size=16, verbose=2, validation_data=(X_test,X_test))

# plot loss

pyplot.plot(history.history[‘loss’], label=’train’)

pyplot.plot(history.history[‘val_loss’], label=’test’)

pyplot.legend()

pyplot.show()

# define an encoder model (without the decoder)

encoder = Model(inputs=visible, outputs=bottleneck)

plot_model(encoder, ‘encoder.png’, show_shapes=True)

# save the encoder to file

encoder.save(‘encoder.h5’)

 

Running the instance fits the model and reports loss on the train and evaluation sets along the way.

If you have issues developing the plots of the model, you can comment out the import and call the plot_model() function.

Your outcomes might demonstrate variance provided the stochastic nature of the algorithm or evaluation procedure, or variations in numerical accuracy. Take up running the instance a few times and contrast the average outcome.

In this scenario, we observe that loss gets low but does not get to zero (as we might have predicted) with no compression within the bottleneck layer. Probably further tuning the model architecture or learning hyperparameters is needed.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

Epoch 393/400

42/42 – 0s – loss: 0.0025 – val_loss: 0.0024

Epoch 394/400

42/42 – 0s – loss: 0.0025 – val_loss: 0.0021

Epoch 395/400

42/42 – 0s – loss: 0.0023 – val_loss: 0.0021

Epoch 396/400

42/42 – 0s – loss: 0.0025 – val_loss: 0.0023

Epoch 397/400

42/42 – 0s – loss: 0.0024 – val_loss: 0.0022

Epoch 398/400

42/42 – 0s – loss: 0.0025 – val_loss: 0.0021

Epoch 399/400

42/42 – 0s – loss: 0.0026 – val_loss: 0.0022

Epoch 400/400

42/42 – 0s – loss: 0.0025 – val_loss: 0.0024

 

A plot of the learning curves is developed displaying that the model accomplishes a good fit in recreating the input, which holds steady throughout training, not overfitting.

So far, so good. We know how to generate an autoencoder without compression. 

The encoder that has received training is saved to the file “encoder.h5” that we can load and leverage later on. 

Then, let’s look into how we might leverage the trained encoder model. 

Autoencoder as Data Prep 

In this portion of the blog, we will leverage the trained encoder model from the autoencoder model to compress input information and train a differing predictive model. 

To start with, let’s determine a baseline in performance on this issue. This is critical as if the performance of a model is not enhanced by the compressed encoding, then the compressed encoding does not inject value to the project and ought not to be leveraged. 

We can go about training a support vector regression (SVR) model on the training dataset directly and assess their performance of the model on the holdout test set. 

As is best practice, we will scale both input and target variables before fitting and assessing the model. 

The full instance is detailed below. 

 

[Control] 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

# baseline in performance with support vector regression model 

from sklearn.datasets import make_regression 

from sklearn.preprocessing import MinMaxScaler 

from sklearn.model_selection import train_test_split 

from sklearn.svm import SVR 

from sklearn.metrics import mean_absolute_error 

# define dataset 

X, y = make_regression(n_samples=1000, n_features=100, n_informative=10, noise=0.1, random_state=1) 

# split into train test sets 

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1) 

# reshape target variables so that we can transform them 

y_train = y_train.reshape((len(y_train), 1)) 

y_test = y_test.reshape((len(y_test), 1)) 

# scale input data 

trans_in = MinMaxScaler() 

trans_in.fit(X_train) 

X_train = trans_in.transform(X_train) 

X_test = trans_in.transform(X_test) 

# scale output data 

trans_out = MinMaxScaler() 

trans_out.fit(y_train) 

y_train = trans_out.transform(y_train) 

y_test = trans_out.transform(y_test) 

# define model 

model = SVR() 

# fit model on the training dataset 

model.fit(X_train, y_train) 

# make prediction on test set 

yhat = model.predict(X_test) 

# invert transforms so we can calculate errors 

yhat = yhat.reshape((len(yhat), 1)) 

yhat = trans_out.inverse_transform(yhat) 

y_test = trans_out.inverse_transform(y_test) 

# calculate error 

score = mean_absolute_error(y_test, yhat) 

print(score) 

 

Running the instance fits an SVR model on the training dataset and evaluates it on the test set. 

Your outcomes might demonstrate variance provided the stochastic nature of the algorithm or evaluation process, or variations in numerical accuracy. Take up running the instance a few times and contrast the average outcome. 

In this scenario, we can observe that the model accomplishes a mean absolute error (MAE) of approximately 89. 

The hope and expectation is that a SVR model fit on an encoded version of the input to accomplish reduced error for the encoding to be viewed as useful. 

89.51082036130629 

We can go about updating the instance to first encode the data leveraging the encoder model trained in the prior section. 

To start with, we can go about loading the trained encoder model from the file. 

 

[Control] 

1 

2 

3 

 

# load the model from file 

encoder = load_model(‘encoder.h5’) 

 

We can then leverage the encoder to transform the raw input data (for example, 100 columns) into bottleneck vectors (example, 100 element vectors) 

This procedure can be applied to the train and test datasets. 

 

[Control] 

1 

2 

3 

4 

5 

 

# encode the train data 

X_train_encode = encoder.predict(X_train) 

# encode the test data 

X_test_encode = encoder.predict(X_test) 

 

We can subsequently leverage this encoded data to train and evaluate the SVR model, as prior. 

 

[Control] 

1 

2 

3 

4 

5 

6 

7 

 

# define model 

model = SVR() 

# fit model on the training dataset 

model.fit(X_train_encode, y_train) 

# make prediction on test set 

yhat = model.predict(X_test_encode) 

 

Connecting this together, the full instance is detailed below. 

 

[Control] 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

# support vector regression performance with encoded input 

from sklearn.datasets import make_regression 

from sklearn.preprocessing import MinMaxScaler 

from sklearn.model_selection import train_test_split 

from sklearn.svm import SVR 

from sklearn.metrics import mean_absolute_error 

from tensorflow.keras.models import load_model 

# define dataset 

X, y = make_regression(n_samples=1000, n_features=100, n_informative=10, noise=0.1, random_state=1) 

# split into train test sets 

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1) 

# reshape target variables so that we can transform them 

y_train = y_train.reshape((len(y_train), 1)) 

y_test = y_test.reshape((len(y_test), 1)) 

# scale input data 

trans_in = MinMaxScaler() 

trans_in.fit(X_train) 

X_train = trans_in.transform(X_train) 

X_test = trans_in.transform(X_test) 

# scale output data 

trans_out = MinMaxScaler() 

trans_out.fit(y_train) 

y_train = trans_out.transform(y_train) 

y_test = trans_out.transform(y_test) 

# load the model from file 

encoder = load_model(‘encoder.h5’) 

# encode the train data 

X_train_encode = encoder.predict(X_train) 

# encode the test data 

X_test_encode = encoder.predict(X_test) 

# define model 

model = SVR() 

# fit model on the training dataset 

model.fit(X_train_encode, y_train) 

# make prediction on test set 

yhat = model.predict(X_test_encode) 

# invert transforms so we can calculate errors 

yhat = yhat.reshape((len(yhat), 1)) 

yhat = trans_out.inverse_transform(yhat) 

y_test = trans_out.inverse_transform(y_test) 

# calculate error 

score = mean_absolute_error(y_test, yhat) 

print(score) 

 

Running the instance first encodes the dataset leveraging the encoder, then fits an SVR model on the training dataset and assesses it on the test set. 

Your outcomes might demonstrate variance provided the stochastic nature of the algorithm or evaluation procedure, or variations in numerical accuracy. Take up running the instance a few times and contrast the average outcome. 

In this scenario, we can observe that the model accomplishes a MAE of approximately 69. 

This is an improved MAE than the identical model assessed on the raw dataset, indicating that the encoding is beneficial for our selected model and test harness. 

69.45890939600503 

Further Reading 

This portion of the blog furnishes additional resources on the subject if you are seeking to delve deeper.  

Books 

Deep Learning, 2016 

APIs 

sklearn.datasets.make_regression API 

sklearn.model_selection.train_test_split API 

Articles 

Autoencoder, Wikipedia 

Conclusion 

In this guide, you found out how to develop and assess an autoencoder for regression predictive modelling. 

Particularly, you learned: 

  • An autoencoder is a neural network model that can be leveraged to learn a compressed representation of raw data. 
  • How to train an autoencoder model on a training dataset and save just the encoder portion of the model. 
  • How to leverage the encoder as a data prep step when training a machine learning model. 
Add Comment