### How to code the GAN Training Algorithm and Loss Functions

The Generative Adversarial Network, or GAN for short, is an architecture for training of a generative model.

The architecture is consisted of dual models. The generator that we are concerned with, and a discriminator model that is leveraged to help in the training of the generator. To start with, both of the generator and discriminator models were implemented as Multilayer Perceptrons (MLP), even though more lately, the models are implemented as deep convolutional neural networks.

It can be a challenge to comprehend how a GAN receives training and precisely how to comprehend and implement the loss function for the generator and discriminator models.

In this guide, you will find out how to go about implementing the generative adversarial network training algorithm and loss function.

After going through this guide, you will be aware of:

- How to go about implementing the training algorithm for a generative adversarial network.
- How the loss function for the discriminator and generator work.
- How to implement weight updates for the discriminator and generator models, practically speaking.

**Tutorial Summarization**

This guide is subdivided into three portions, which are:

- How to implement the GAN training algorithm
- Comprehending the GAN Loss Function
- How to train GAN models in Practice

Note: The code instances in this guide are snippets only, not standalone runnable instances. They are developed to assist you develop an intuition for the algorithm and they can be leveraged as the beginning point for implementation of the GAN training algorithm on your own project.

**How to implement the GAN training algorithm**

The GAN training algorithm consists of training both the discriminator and the generator model in parallel.

The algorithm is summed up in the figure below, taken from the original 2014 paper by Goodfellow, et al. entitled “Generative Adversarial Networks”

Let’s take a time out to unpack and get acquainted with this algorithm.

The outer loop of the algorithm consists of iterating over stages to train the models in the architecture. One cycle through this loop is not an epoch: it is a singular updated made up of particular batch updates to the discriminator and generator models.

An epoch can be defined as one cycle via a training dataset, where the samples in a training dataset are leveraged to update the model weights in mini-batches. For instance, a training dataset of one hundred samples leveraged to train a model with a mini-batch size of 10 samples would consist of 10 mini batch updates for each epoch. The model would be fitted for a provided number of epochs, like 500.

This is usually hidden from you through the automated training of a model through a call to the fit() function and mentioning the number of epochs and the size of every mini-batch.

In the scenario of the GAN, the number of training iterations must be defined on the basis of the size of your training dataset and batch size. In the scenario of a dataset with 100 samples, a batch size of ten, and five hundred training epochs, we would initially calculate the number of batches per each epoch and leverage this to calculate the cumulative number of training iterations leveraging the number of epochs.

For instance,

1 2 3 | … batches_per_epoch = floor(dataset_size / batch_size) total_iterations = batches_per_epoch * total_epochs |

In the scenario of a dataset of one hundred samples, a batch size of 10, and 500 epochs, the GAN would be trained for floor(100/10) * 500 or 5,000 cumulative iterations.

Then we can observe that a single iteration of training outcomes in potentially several updates to the discriminator and one update to the generator, where the number of updates to the discriminator is a hyperparameter that is set to 1.

The training procedure consists of simultaneous SGD. On every step, dual minibatches are sampled, a minibatch of x values from the dataset and a minimatch of z values drawn from the model’s prior over latent variables. Then dual gradient steps are made at the same time.

We can thus summarize the training algorithm with Python pseudocode as follows:

1 2 3 4 5 6 7 8 9 10 11 12 | # gan training algorithm def train_gan(dataset, n_epochs, n_batch): # calculate the number of batches per epoch batches_per_epoch = int(len(dataset) / n_batch) # calculate the number of training iterations n_steps = batches_per_epoch * n_epochs # gan training algorithm for i in range(n_steps): # update the discriminator model # … # update the generator model # … |

An alternative strategy might consist of enumerating the number of training epochs and splitting the training dataset into batches for every epoch.

Updating the discriminator model consists of a few steps.

To start with, a batch of arbitrary points from the latent space must be chosen for leveraging as input to the generator model to furnish the foundation for the produced or ‘fake’ samples. Then a batch of samples from the training dataset must be chosen for input to the discriminator as the ‘real’ samples.

Then, the discriminator model must make forecasts for the real and fake samples and the weights of the discriminator must receive updates proportional to how right or wrong those forecasts were. The forecasts are probabilities and we will go into the nature of the forecasts and the loss function that is reduced in the subsequent section. For now, we can outline what these steps actually appear practically.

We require a generator and a discriminator model, for example, like a Keras model. These can be furnished as arguments to the training function.

Then, we must produce points from the latent space and then leverage the generator model in its present form to produce a few fake images. For instance:

1 2 3 4 5 6 7 | … # generate points in the latent space z = randn(latent_dim * n_batch) # reshape into a batch of inputs for the network z = x_input.reshape(n_batch, latent_dim) # generate fake images fake = generator.predict(z) |

Observe that the size of the latent dimension is additionally furnished as a hyperparameter to the training algorithm.

We then must choose a batch of real samples, and this too will wrapped into a function.

1 2 3 4 5 | … # select a batch of random real images ix = randint(0, len(dataset), n_batch) # retrieve real images real = dataset[ix] |

The discriminator model must then make a forecast for every generated and actual image and the weights must be updated.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | # gan training algorithm def train_gan(generator, discriminator, dataset, latent_dim, n_epochs, n_batch): # calculate the number of batches per epoch batches_per_epoch = int(len(dataset) / n_batch) # calculate the number of training iterations n_steps = batches_per_epoch * n_epochs # gan training algorithm for i in range(n_steps): # generate points in the latent space z = randn(latent_dim * n_batch) # reshape into a batch of inputs for the network z = z.reshape(n_batch, latent_dim) # generate fake images fake = generator.predict(z) # select a batch of random real images ix = randint(0, len(dataset), n_batch) # retrieve real images real = dataset[ix] # update weights of the discriminator model # …
# update the generator model # … |

Then, the generator model must be updated.

Then, a batch of random points from the latent space must be chosen and passed to the generator to produce fake imagery, and then passed to the discriminator to categorize.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | # gan training algorithm def train_gan(generator, discriminator, dataset, latent_dim, n_epochs, n_batch): # calculate the number of batches per epoch batches_per_epoch = int(len(dataset) / n_batch) # calculate the number of training iterations n_steps = batches_per_epoch * n_epochs # gan training algorithm for i in range(n_steps): # generate points in the latent space z = randn(latent_dim * n_batch) # reshape into a batch of inputs for the network z = z.reshape(n_batch, latent_dim) # generate fake images fake = generator.predict(z) # select a batch of random real images ix = randint(0, len(dataset), n_batch) # retrieve real images real = dataset[ix] # update weights of the discriminator model # … # generate points in the latent space z = randn(latent_dim * n_batch) # reshape into a batch of inputs for the network z = z.reshape(n_batch, latent_dim) # generate fake images fake = generator.predict(z) # classify as real or fake result = discriminator.predict(fake) # update weights of the generator model # … |

It is interesting that the discriminator receives updates with dual batches of samples each training iteration while the generator only receives updates with a singular batch of samples per each training iteration.

Now that we have given definition to the training algorithm for the GAN, we require to comprehend how the model weights are updated. This needs comprehension of the loss function leveraged to train the GAN.

**Comprehending The GAN Loss Function**

This discriminator receives training to rightly categorize real and fake imagery.

This is accomplished through maximization of the log of forecasted probability of real images and the log of inverted probability of fake imagery, averaged over each mini-batch of instances.

Remember that we include log probabilities, which is the same as multiplication of probabilities, even though without vanishing into small numbers. Thus, we can comprehend this loss function as looking for probabilities near to 1.0 for real imagery and odds close to 0.0 for fake imagery, inverted to become bigger numbers. The inclusion of these values implies that reduced average values of this function have the outcome of improved performance of the discriminator.

Inverting this to a minimization problem, it should not be shocking if you are acquainted with producing neural networks for binary classification, as this is precisely the strategy leveraged.

This is just the traditional cross-entropy cost that is reduced when training a traditional binary classifier with a sigmoid output. The only difference is that classifier receives training on dual minibatches of data, one which can be traced back to the dataset, where the label is 1 for all instances, and one coming from the generator, where the label is 0 for all instances.

The generator can be a bit more tricky.

The GAN algorithm gives definition to the generator model’s loss as reducing the log of the inverted probability of the discriminator’s forecast of fake images, averaged over a mini-batch.

This is straightforward, but going by the word of the authors, it is not efficient practically when the generator is poor and the discriminator is good at rejection of fake imagery with high confidence. The loss function no longer provides good gradient data that the generator can leverage to modify weights and rather saturates.

In this scenario, log(1 – D(G(z))) saturates. Instead of training G to reduce log (1 – D(G(z))) we can train G to maximize log D(G(z)). This objective function has the outcome of the same static point of the dynamics of G and D but furnishes a lot stronger gradients early in learning.

Rather, the writers recommend maximization of the log of the discriminator’s forecasted probability for fake imagery.

The change is subtle.

In the first scenario, the generator receives training to reduce the probability of the discriminator being right. With this modification to the loss function, the generator receives training to maximize the odds of the discriminator being wrong.

In the minimax game, the generator reduces the log-probability of the discriminator being right. In this game, the generator maximizes the log probability of the discriminator being wrong.

The indicator of this loss function can then be inverted to provide a familiar minimizing loss function for training of the generator. As such, this is at times referenced to as the -log D trick for training of GANs.

Our baseline comparison is DCGAN, a GAN with a convolutional architecture that has received training with the traditional GAN procedure leveraging the -log D trick.

Now that we comprehend the GAN loss function, we can observe how the discriminator and the generator model can be updated practically.

**How to Train GAN Models Practically**

The practical implementing of the GAN loss function and model updates is pretty direct.

We will look at instances leveraging the Keras library.

We can implement the discriminator directly through configuration of the discriminator model to forecast a probability of 1 for real images and 0 for fake imagery and reducing the cross-entropy loss, particularly the binary cross-entropy loss.

For instance, a snippet of our model definition with Keras for the discriminator might appear as follows for the output layer and the compilation of the model with the relevant loss function.

1 2 3 4 5 | … # output layer model.add(Dense(1, activation=’sigmoid’)) # compile model model.compile(loss=’binary_crossentropy’, …) |

The defined model can receive training for every batch of real and fake samples furnishing arrays of 1s and 0s for the predicted/expected result.

The ones() and zeros() NumPy functions can be leveraged to develop these target labels, and the Keras function train_on_batch() can be leveraged to update the model for every batch of samples.

1 2 3 4 5 6 7 8 9 10 11 | … X_fake = … X_real = … # define target labels for fake images y_fake = zeros((n_batch, 1)) # update the discriminator for fake images discriminator.train_on_batch(X_fake, y_fake) # define target labels for real images y_real = ones((n_batch, 1)) # update the discriminator for real images discriminator.train_on_batch(X_real, y_real) |

The discriminator model will be trained to forecast the probability of “realness” of a provided input image that can be interpreted as a class label of class=0 for fake and class=1 for real.

The generator receives training to maximize the discriminator forecasting a high probability of “realness” for produced images.

This is accomplished by updating the generator through the discriminator with the class label of 1 for the produced imagery. The discriminator is not updated in this operation but furnishes the gradient data needed to update the weights of the generator model.

For instance, if the discriminator forecasts a minimal average probability for the batch of produced images, then this will have the outcome in a large error signal propagated backward into the generator provided the “expected probability” for the samples was 1.0 for real. This big error signal, in turn, has the outcome of comparatively major changes to the generator to hopefully enhance its ability at producing fake samples on the subsequent batch.

This can be implemented in Keras by developing a composite model that brings together the generator and discriminator models, facilitating the output images from the generator to flow into discriminator directly, and in turn, allow the error signals from the forecasted probabilities of the discriminator to flow back through the weights of the generator model.

For instance:

1 2 3 4 5 6 7 8 9 10 11 12 13 | # define a composite gan model for the generator and discriminator def define_gan(generator, discriminator): # make weights in the discriminator not trainable discriminator.trainable = False # connect them model = Sequential() # add generator model.add(generator) # add the discriminator model.add(discriminator) # compile model model.compile(loss=’binary_crossentropy’, optimizer=’adam’) return model |

The composite model can then be updated leveraging fake images and real class labels.

1 2 3 4 5 6 7 8 9 | … # generate points in the latent space z = randn(latent_dim * n_batch) # reshape into a batch of inputs for the network z = z.reshape(n_batch, latent_dim) # define target labels for real images y_real = ones((n_batch, 1)) # update generator model gan_model.train_on_batch(z, y_real) |

That finishes our tour of the GAN training algorithm, loss function and weight update details for the discriminator and generator models.

**Further Reading**

This section furnishes more resources on the subject if you are seeking to delve deeper.

**Papers**

- Generative Adversarial Network, 2014.
- NIPS 2016 Tutorial: Generative Adversarial Networks, 2016
- Wasserstein GAN, 2017

**Articles**

**Conclusion**

In this guide, you found out how to implement the generative adversarial network training algorithm and loss functions.

Particularly, you learned about:

- How to implement the training algorithm for a generative adversarial network.
- How the loss function for the discriminator and generator work.
- How to implement weight updates for the discriminator and generator models, practically.