>Business >Produce a neural network for banknote authentication

Produce a neural network for banknote authentication

It can be a challenge to produce a neural network predictive model for a new dataset.

One strategy is to initially inspect the dataset and generate ideas for what models might function, then look into the learning dynamics of simple models on the dataset, then ultimately develop and tune a model for the dataset with a solid test harness.

This process can be leveraged to develop efficient neural network models for classification and regression predictive modelling problems.

In this guide, you will find out how to produce a multilayer perceptron neural network model for the banknote binary classification dataset.

After going through this guide, you will be aware of:

  • How to load and summarize the banknote dataset and leverage the outcomes to indicate data preparations and model configurations to leverage.
  • How to look into the learning dynamics of simplistic MLP models on the dataset.
  • How to develop solid estimates of model performance, tune model performance and make forecasts on fresh data.

Tutorial Summarization

This tutorial is subdivided into four portions, which are:

1] Banknote classification dataset

2] Neural network learning dynamics

3] Robust model evaluation

4] Final model and make forecasts

Banknote Classification Dataset

The first stage is to go about defining and exploring the dataset.

We will be operating with the “Banknote” standard binary classification dataset.

The banknote dataset consists of forecasting whether a provided banknote is genuine provided a number of measures taken from a photograph.

The dataset consists of 1,372 rows with five numeric variables. It is a classification issue with two classes (binary classification).

Below is furnished a list of the five variables within the dataset.

  • Variance of Wavelet Transformed image (continuous)
  • Skewness of Wavelet Transformed image (continuous)
  • kurtosis of Wavelet Transformed image (continuous)
  • entropy of image (continuous)
  • class (integer)

Below is a sample of the first five rows of the dataset.

 

1

2

3

4

5

6

7

3.6216,8.6661,-2.8073,-0.44699,0

4.5459,8.1674,-2.4586,-1.4621,0

3.866,-2.6383,1.9242,0.10645,0

3.4566,9.5228,-4.0112,-3.5944,0

0.32924,-4.4552,4.5718,-0.9888,0

4.3684,9.6718,-3.9606,-3.1625,0

 

We can go about loading the dataset directly from the URL and goes about reporting the shape of the dataset.

In this scenario, we can confirm the dataset possesses five variables (4 input and a single output) and that the dataset possesses has 1,372 rows of data.

This is not several rows of data for a neural network and indicates that a small network, probably with regularization, would be appropriate.

It also indicates that leveraging k-fold cross-validation would be a good idea provided that it will give a more reliable estimate of model performance than a train/test split and as a single model will fit in seconds rather than hours or days with the biggest datasets.

(1372, 5)

Then, we can learn more about the dataset by observing summary statistics and a plot of the data.

1

2

3

4

5

6

7

8

9

10

11

12

# show summary statistics and plots of the banknote dataset

from pandas import read_csv

from matplotlib import pyplot

# define the location of the dataset

url = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/banknote_authentication.csv’

# load the dataset

df = read_csv(url, header=None)

# show summary statistics

print(df.describe())

# plot histograms

df.hist()

pyplot.show()

 

Running the instance first loads the data prior and then prints summary stats for every variable.

We can observe that values demonstrate variance with differing means and standard deviations, probably some normalization or standardization would be needed before modelling.

1

2

3

4

5

6

7

8

9

                 0            1            2            3            4

count  1372.000000  1372.000000  1372.000000  1372.000000  1372.000000

mean      0.433735     1.922353     1.397627    -1.191657     0.444606

std       2.842763     5.869047     4.310030     2.101013     0.497103

min      -7.042100   -13.773100    -5.286100    -8.548200     0.000000

25%      -1.773000    -1.708200    -1.574975    -2.413450     0.000000

50%       0.496180     2.319650     0.616630    -0.586650     0.000000

75%       2.821475     6.814625     3.179250     0.394810     1.000000

max       6.824800    12.951600    17.927400     2.449500     1.000000

 

A histogram plot is then developed for every variable.

We can observe that probably the first two variables possess a Gaussian-like distribution and the next two input variables might possess a skewed Gaussian distribution or an exponential distribution.

We might possess some advantages in leveraging a power transform on every variable in order to make the probability distribution less skewed which will probably enhance model performance.

Now that we are acquainted with the dataset, let’s look into how we might develop a neural network model.

Neural Network Learning Dynamics

We will produce a Multilayer Perceptron (MLP) model for the dataset leveraging TensorFlow.

We cannot know what model architecture of learning hyperparameters would be adequate or best for this dataset, so we must go about experimenting and find out what functions well.

Provided that the dataset is small, a small batch size is likely a solid idea, for example, 16 or 32 rows. Leveraging the Adam version of stochastic gradient descent is a good idea when beginning as it will automatically adapt the learning pace and operates well on most datasets.

Prior to evaluating the models in earnest, it is a good idea to review the learning dynamics and tune the model architecture and learning configuration till we possess stable learning dynamics, then observe obtaining the most out of the model.

We can perform this by leveraging a simple train/test split of the data and review plots of the learning curves. This will assist us in observing if we are over-learning or under-learning, then we can adapt the configuration accordingly.

To start with, we must make sure that all input variables are floating-point values and encode the target label as integer values 0 and 1.

 

1

2

3

4

5

# ensure all data are floating point values

X = X.astype(‘float32’)

# encode strings to integer

y = LabelEncoder().fit_transform(y)

 

Then, we can split the dataset into input and output variables, then into 67/33 train and test sets.

 

1

2

3

4

5

# split into input and output columns

X, y = df.values[:, :-1], df.values[:, -1]

# split into train and test datasets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

 

We can define a minimal MLP model. In this scenario, we will leverage one hidden layer with ten nodes and one output layer (selected randomly). We will leverage the ReLU activation function within the hidden layer and the “he_normal” weight initialization, as combined, they are a good practice.

The output of the model is a sigmoid activation for binary classification and we will minimize binary cross-entropy loss.

 

1

2

3

4

5

6

7

8

9

# determine the number of input features

n_features = X.shape[1]

# define model

model = Sequential()

model.add(Dense(10, activation=’relu’, kernel_initializer=’he_normal’, input_shape=(n_features,)))

model.add(Dense(1, activation=’sigmoid’))

# compile the model

model.compile(optimizer=’adam’, loss=’binary_crossentropy’)

 

We will fit the model for 50 training epochs (selected randomly) with a batch size of 32 as it is a small dataset.

We are fitting the model on raw data, which we think may be a good idea, but it is a critical beginning point.

 

1

2

3

# fit the model

history = model.fit(X_train, y_train, epochs=50, batch_size=32, verbose=0, validation_data=(X_test,y_test))

 

At the conclusion of training, we will assess the model’s performance on the test dataset and report performance as the classification precision.

1

2

3

4

5

6

# predict test set

yhat = model.predict_classes(X_test)

# evaluate predictions

score = accuracy_score(y_test, yhat)

print(‘Accuracy: %.3f’ % score)

 

Lastly, we will plot learning curves of the cross-entropy loss on the train and test sets during training.

 

1

2

3

4

5

6

7

8

9

# plot learning curves

pyplot.title(‘Learning Curves’)

pyplot.xlabel(‘Epoch’)

pyplot.ylabel(‘Cross Entropy’)

pyplot.plot(history.history[‘loss’], label=’train’)

pyplot.plot(history.history[‘val_loss’], label=’val’)

pyplot.legend()

pyplot.show()

 

Connecting all of this together, the complete instance of assessing our first MLP on the banknote dataset is detailed below.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

# fit a simple mlp model on the banknote and review learning curves

from pandas import read_csv

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import LabelEncoder

from sklearn.metrics import accuracy_score

from tensorflow.keras import Sequential

from tensorflow.keras.layers import Dense

from matplotlib import pyplot

# load the dataset

path = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/banknote_authentication.csv’

df = read_csv(path, header=None)

# split into input and output columns

X, y = df.values[:, :-1], df.values[:, -1]

# ensure all data are floating point values

X = X.astype(‘float32′)

# encode strings to integer

y = LabelEncoder().fit_transform(y)

# split into train and test datasets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

# determine the number of input features

n_features = X.shape[1]

# define model

model = Sequential()

model.add(Dense(10, activation=’relu’, kernel_initializer=’he_normal’, input_shape=(n_features,)))

model.add(Dense(1, activation=’sigmoid’))

# compile the model

model.compile(optimizer=’adam’, loss=’binary_crossentropy’)

# fit the model

history = model.fit(X_train, y_train, epochs=50, batch_size=32, verbose=0, validation_data=(X_test,y_test))

# predict test set

yhat = model.predict_classes(X_test)

# evaluate predictions

score = accuracy_score(y_test, yhat)

print(‘Accuracy: %.3f’ % score)

# plot learning curves

pyplot.title(‘Learning Curves’)

pyplot.xlabel(‘Epoch’)

pyplot.ylabel(‘Cross Entropy’)

pyplot.plot(history.history[‘loss’], label=’train’)

pyplot.plot(history.history[‘val_loss’], label=’val’)

pyplot.legend()

pyplot.show()

 

Running the instance first fits the model on the training dataset, then reports the classification precision on the test dataset.

Your outcomes may demonstrate variance provided the stochastic nature of the algorithm or evaluation procedures, or variations in numerical accuracy. Take up running the instance a few times and contrast the average outcome.

In this scenario, we can observe that the model accomplished great or perfect precision of 100%. This might indicate that the forecast problem is simple and/or that neural networks are a good fir for the problem.

Accuracy: 1.000

Line plots of the loss on the train and test sets are then developed.

We can observe that the model appears to converge well and does not display any indicators of overfitting or underfitting.

We did very well on our first try.

Now that we possess some notion of the learning dynamics for a simplistic MLP model on the dataset, we can look into developing a more solid evaluation of model performance on the dataset.

Robust Model Evalutation

The k-fold cross-validation process can furnish a more reliant estimate of MLP performance, even though it can be really slow.

This is due to the fact that k models must be fitted and assessed. This is not a issue when the dataset size is small, like the banknote dataset.

We can leverage the StratifiedKFold class and enumerate every fold manually, fit the model, assess it, and then report the mean of the evaluation scores at the conclusion of the procedure.

 

1

2

3

4

5

6

7

8

9

10

11

# prepare cross validation

kfold = KFold(10)

# enumerate splits

scores = list()

for train_ix, test_ix in kfold.split(X, y):

# fit and evaluate the model…

# summarize all scores

print(‘Mean Accuracy: %.3f (%.3f)’ % (mean(scores), std(scores)))

 

We can leverage this framework to produce a reliable estimate of MLP model performance with our base configuration, and even with an array of differing data preparations, model architectures, and learning configurations.

It is critical that we initially developed a comprehension of the learning dynamics of the models on the dataset in the prior section prior to leveraging a k-fold cross-validation to estimate the performance. If we began to tune the model directly, we might get good outcomes, but if not, we may have no notion of why, for example, that the model was over or under fitting.

If we make major changes to the model again, it is a good idea to go back and confirm that the model is converging in the correct fashion.

The full example of this framework to assess the base MLP model from the prior section is detailed below.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

# k-fold cross-validation of base model for the banknote dataset

from numpy import mean

from numpy import std

from pandas import read_csv

from sklearn.model_selection import StratifiedKFold

from sklearn.preprocessing import LabelEncoder

from sklearn.metrics import accuracy_score

from tensorflow.keras import Sequential

from tensorflow.keras.layers import Dense

from matplotlib import pyplot

# load the dataset

path = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/banknote_authentication.csv’

df = read_csv(path, header=None)

# split into input and output columns

X, y = df.values[:, :-1], df.values[:, -1]

# ensure all data are floating point values

X = X.astype(‘float32′)

# encode strings to integer

y = LabelEncoder().fit_transform(y)

# prepare cross validation

kfold = StratifiedKFold(10)

# enumerate splits

scores = list()

for train_ix, test_ix in kfold.split(X, y):

# split data

X_train, X_test, y_train, y_test = X[train_ix], X[test_ix], y[train_ix], y[test_ix]

# determine the number of input features

n_features = X.shape[1]

# define model

model = Sequential()

model.add(Dense(10, activation=’relu’, kernel_initializer=’he_normal’, input_shape=(n_features,)))

model.add(Dense(1, activation=’sigmoid’))

# compile the model

model.compile(optimizer=’adam’, loss=’binary_crossentropy’)

# fit the model

model.fit(X_train, y_train, epochs=50, batch_size=32, verbose=0)

# predict test set

yhat = model.predict_classes(X_test)

# evaluate predictions

score = accuracy_score(y_test, yhat)

print(‘>%.3f’ % score)

scores.append(score)

# summarize all scores

print(‘Mean Accuracy: %.3f (%.3f)’ % (mean(scores), std(scores)))

 

Running the instance reports the model performance every iteration of the evaluation process and reports the mean and standard deviation of classification precision at the conclusion of the run.

Your outcomes might demonstrate variance provided the stochastic nature of the algorithm or evaluation procedure, or variations in numerical accuracy. Take up running the instance a few times and contrast the average outcome.

In this scenario, we can observe that the MLP model accomplished a mean precision of approximately 99.9%.

This provides confirmation of our expectation that the base model config functions really well for this dataset, and indeed the model is a good fit for the problem and probably the problem is quite trivial to solve.

This is a surprise as we would have expected some data scaling and probably  power transform to be needed.

1

2

3

4

5

6

7

8

9

10

11

>1.000

>1.000

>1.000

>1.000

>0.993

>1.000

>1.000

>1.000

>1.000

>1.000

Mean Accuracy: 0.999 (0.002)

 

Then, let’s look at how we might fit a final model and leverage it to make forecasts.

Final Model and Make Predictions

After we select a model config, we can train a final model on all available data and leverage it to make predictions on fresh data.

In this scenario, we will leverage the model with dropout and a minimal batch size as our final model.

We can prep the data and fit the model as prior, even though on the entire dataset rather than a training subset of the dataset.

 

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

# split into input and output columns

X, y = df.values[:, :-1], df.values[:, -1]

# ensure all data are floating point values

X = X.astype(‘float32′)

# encode strings to integer

le = LabelEncoder()

y = le.fit_transform(y)

# determine the number of input features

n_features = X.shape[1]

# define model

model = Sequential()

model.add(Dense(10, activation=’relu’, kernel_initializer=’he_normal’, input_shape=(n_features,)))

model.add(Dense(1, activation=’sigmoid’))

# compile the model

model.compile(optimizer=’adam’, loss=’binary_crossentropy’)

 

We can then leverage this model to make forecasts on new data.

To start with, we can define a new row of data.

 

1

2

3

# define a row of new data

row = [3.6216,8.6661,-2.8073,-0.44699]

 

We took this row from the first row of the dataset and the expected label is a ‘0’

We can then make a forecast.

 

1

2

3

# make prediction

yhat = model.predict_classes([row])

 

Then invert the transform on the prediction, so we can leverage or interpret the outcome in the right label (which is just an integer for this dataset)

1

2

3

# invert transform to get label for class

yhat = le.inverse_transform(yhat)

 

And in this scenario, we will merely report the prediction.

1

2

3

# report prediction

print(‘Predicted: %s’ % (yhat[0]))

 

Connecting all of this together, the full instance of fitting a final model for the banknote dataset and leveraging it to make a forecast on new data is detailed below.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

# fit a final model and make predictions on new data for the banknote dataset

from pandas import read_csv

from sklearn.preprocessing import LabelEncoder

from sklearn.metrics import accuracy_score

from tensorflow.keras import Sequential

from tensorflow.keras.layers import Dense

from tensorflow.keras.layers import Dropout

# load the dataset

path = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/banknote_authentication.csv’

df = read_csv(path, header=None)

# split into input and output columns

X, y = df.values[:, :-1], df.values[:, -1]

# ensure all data are floating point values

X = X.astype(‘float32′)

# encode strings to integer

le = LabelEncoder()

y = le.fit_transform(y)

# determine the number of input features

n_features = X.shape[1]

# define model

model = Sequential()

model.add(Dense(10, activation=’relu’, kernel_initializer=’he_normal’, input_shape=(n_features,)))

model.add(Dense(1, activation=’sigmoid’))

# compile the model

model.compile(optimizer=’adam’, loss=’binary_crossentropy’)

# fit the model

model.fit(X, y, epochs=50, batch_size=32, verbose=0)

# define a row of new data

row = [3.6216,8.6661,-2.8073,-0.44699]

# make prediction

yhat = model.predict_classes([row])

# invert transform to get label for class

yhat = le.inverse_transform(yhat)

# report prediction

print(‘Predicted: %s’ % (yhat[0]))

 

Running the instance fits the model on the entire dataset and makes a forecast for a singular row of new data.

Your outcomes might demonstrate variance provided the stochastic nature of the algorithm or evaluation process, or variations in numerical accuracy. Take up running the instance a few times and contrast the average outcome.

In this scenario, we can observe that the model forecasted a ‘0’ label for the input row.

Predicted: 0.0

Conclusion

In this guide, you found out how to develop a multilayer perceptron neural network model for the banknote binary classification dataset.

Particularly, you learned:

  • How to load and summarize the banknote dataset and leverage the outcomes to indicate data preparations and model configurations to leverage.
  • How to look into the learning dynamics of simple MLP models on the dataset.
  • How to develop robust estimates of model performance, tune model performance and make forecasts on fresh data.
Add Comment