With regards to sequential or time series data, conventional feedforward networks can’t be leveraged for learning and forecasting/prediction. A mechanism is needed that can retain historical data to predict the future values. Recurrent neural networks or RNNs in short are a variety of the traditional feedforward artificial neural networks that can handle sequential data and can be trained to retain the know-how, from a historical perspective.

After going through this guide, you will be aware of:

- Recurrent neural networks
- What is meant by unfolding an RNN
- How weights are updated in an RNN
- Several RNN architectures

**Tutorial Summarization**

This tutorial is subdivided into two portions, which are:

- The working of an RNN
- Unfolding in time
- Backpropagation through time algorithm

- Differing RNN architectures and variants

**Prerequisites**

For this guide, the assumption is that you are already acquainted with artificial neural networks and the back propagation algorithm. This guide also details how gradient based back propagation algorithm is leveraged in training a neural network.

**What is a Recurrent Neural Network**

A recurrent neural network (RNN) is a special variant of artificial neural network adapted to work for time series data or data that consists of sequences. Traditional feed forward neural networks are just intended for data points, which are independent of each other. But, if we possess data in a sequence that a single data point is dependent on the prior data point, we are required to alter the neural network to integrate the dependencies amongst these data points. RNNs have the notion of ‘memory’ that assists them in recording the states or data of prior inputs to produce the next output of the sequence.

**Unfolding A Recurrent Neural Network**

A simple RNN possesses a feedback loop as displayed in the first diagram of the above image. The feedback loop displayed in the gray rectangle can be unrolled in 3 time steps to generate the second network of the above figure. Obviously, you can induce variance to the architecture so that the network unrolls k time steps. In the image, the following notation is leveraged:

At each time step we can unfold the network for k time steps to obtain the output at time step k + 1. The unfolded network is very much like to the feedforward neural network. The rectangle in the unfolded network displays an operation that is happening. So for instance, with an activation function f:

h_{t + }1 = f(x_{t, }h_{t, }w_{x, }w_{h, }b_{n) }= f(w_{x}x_{t }+ w_{h}h_{t + }b_{n)}

The output y at time t is computed as:

y_{t }= f(h_{t, }w_{y}) = f (w_{y } · h_{t }+ b_{y})

Here, · is the dot product.

Therefore, in the feedforward pass of a RNN, the network computes the values of the hidden units and the output upon k time steps. The weights connected with the network are shared temporally. Every recurrent layer has dual sets of weights, one for the input and the second one for the hidden unit. The final feedforward layer, which computes the final output for the kth timestep is much like a conventional layer of a conventional feedforward network.

**The Activation Function**

We can leverage any activation function we desire in the recurrent neural network. Common options are:

**Training a Recurrent Neural Network**

The backpropagation algorithm of an artificial neural network is altered to integrate the unfolding in time to train the weights of the network. This algorithm has its basis on computation of the gradient vector and is referred to as back propagation in time or BPTT algorithm for short. The pseudo-code with regards to training is provided here. The value of k can be chosen by the user with regards to training. In the pseudo-code below p_{t }is the targeted value at time step t.

- Repeat until stopping criterion is met.
- Set all h to zero.
- Repeat for t = 0 to n-k

- Forward propagate the network over the unfolded network for k time steps to compute all h and y.
- Compute the error as:
- Backpropagate the error throughout the unfolded network and go about updating the weights.

**Variants of RNNs**

There are differing variants of recurrent neural networks with varying architectures. A few examples are:

**One-to-one**

Here there is a single (x_{t, }y_{t}) pair. Conventional neural networks deploy a one-to-one architecture.

**One to many**

In one to several networks, a singular input at x_{t}, can generate several outputs, e.g. (y_{t0, }y_{t1,}y_{t2})

Music generation is an instance area, where one to several networks are deployed.

**Many to One**

In this scenario, several inputs from differing time steps generate a singular output. For instance (x_{t }, x_{t + 1, }x_{t+2}) can generate a singular output y_{t}. Such networks are deployed in sentiment analysis or emotion detection, where the class label is dependent upon a sequence of words.

**Many to Many**

There are several potential for many to many. An instance is displayed above, where dual inputs generate three outputs. Many to many networks are applied in machine translation, for example, English to French or vice versa translation systems.

**Benefits and drawbacks with regards to RNNs**

RNNs contain several benefits like:

- Capacity to manage sequence data
- Capacity to manage inputs of variable lengths.
- Ability to record or ‘memorize’ historical data.

The drawbacks are:

- The computation can be really slow.
- The network does not enter into consideration future inputs to make decisions.
- Vanishing gradient problem, where the gradients leveraged to compute the weight update might get really close to zero averting the network from learning fresh weights. The deeper the network, the more significant is this issue.

**Differing RNN architectures**

There are differing variations of RNNs that are being applied in practice within machine learning problems:

**Bidirectional recurrent neural networks (BRNN)**

In BRNN, inputs from future time steps are leveraged to enhance the precision of the network. It is like possessing know-how of the first and last words of a sentence to forecast the middle words.

**Gated Recurrent Units (GRU)**

These networks are developed to manage the vanishing gradient problem. They possess a reset and update gate. These gates decide which data is to be retained for subsequent forecasts.

**Long Short Term Memory (LSTM)**

LSTMs were additionally developed to tackle the vanishing gradient problem in RNNs. LSTM leverages a trio of gates referred to as input, output, and forget gate. Just like GRU, these gates decide which data to retain.

**Further Reading**

This portion of the blog furnishes additional resources on the subject if you are seeking to delve deeper.

**Books**

- Deep Learning Essentials, by Wei Di, Anurag Bharadwaj and Jianjing Wei
- Deep Learning by Ian Goodfellow, Joshua Bengio and Aaron Courville

**Articles**

**Conclusion**

In this guide, you found out all about recurrent neural networks and their several architectures.

Particularly, you learned:

- How a recurrent neural network manages sequential data
- Unfolding in time in a recurrent neural network
- What is back propagation in time
- Advantages and Disadvantages of RNNs
- Several architectures and variants of RNN

#### An intro to recurrent neural networks and the math that drives it

With regards to sequential or time series data, conventional feedforward networks can’t be leveraged for learning and forecasting/prediction. A mechanism is needed that can retain historical data to predict the future values. Recurrent neural networks or RNNs in short are a variety of the traditional feedforward artificial neural networks that can handle sequential data and can be trained to retain the know-how, from a historical perspective.

#### How to code the GAN Training Algorithm and Loss Functions

The Generative Adversarial Network, or GAN for short, is an architecture for training of a generative model. The architecture is consisted of dual models. The generator that we are concerned with, and a discriminator model that is leveraged to help in the training of the generator. To start with, both of the generator and discriminator models were implemented as Multilayer Perceptrons (MLP), even though more lately, the models are implemented as deep convolutional neural networks.

#### How to implement Wasserstein Loss for Generative Adversarial Networks

The Wasserstein Generative Adversarial Network, or Wasserstein GAN is an extension to the generative adversarial network (GAN) that both enhances the stability during training of the model and furnishes a loss function that corresponds with the quality of produced imagery.

#### Deep learning frameworks for human activity identification

Human activity recognition, or HAR in short, is a difficult time series classification activity. It consists of forecasting the movement of an individual on the basis of sensor information and conventionally consists of deep domain expertise and strategies that range from the raw data in order to go about fitting a machine learning model. Lately, deep learning strategies like convolutional neural networks and recurrent neural networks have demonstrated potent and even accomplish cutting-edge outcomes by automatically learning features from the

#### Vulnerability Assessment vs. Penetration Test

There are several perspectives on what the difference is between a vulnerability assessment versus a penetration test. The primary distinction, appears to be that many hold the belief a comprehensive penetration test consists of identification of as many vulnerabilities as feasible, while others hold the belief that Penetration Tests are objective-oriented and primarily don’t concern themselves with other vulnerabilities might exist.

#### Loading and exploring household electricity utilization data

With the rapid proliferation of smart electricity meters and the widespread adoption of electricity generation tech such as solar panels, there is a literal treasure trove of electricity usage data available at our disposal today. This data signifies a multivariate time series of power-related variables, which in turn could be leveraged to model and even predict future electricity consumption. In this guide, you will learn about a household power consumption dataset for multi-step time series predictions and how to better comprehend the

#### Cost-sensitive logistic regression with regards to imbalanced classification

Logistic regression is not compatible with imbalanced classification directly. Rather, the training algorithm leveraged in fitting the logistic regression model ought to be altered to take the skewed distribution into consideration. This can be accomplished by mentioning a class weighting configuration that is leveraged to influence the amount that logistic regression coefficients receive updates during the course of training. The weighting can penalize the model less for errors committed on instances from the majority class and penalize the model more for errors

#### Principal Component Analysis for Visualization

Principal component analysis (PCA) is an unsupervised ML strategy. Probably the most widespread leveraging of principal component analysis is dimensionality reduction. Aside from leveraging PCA as a data prep strategy, we can additionally leverage it to assist visualize data. An image is worth a million words, as they say. With the data visualization, it is simpler for us to obtain some insight and deliberate on the subsequent step in our machine learning models.

#### How to leverage ROC Curves and Precision-Recall Curves with regards to Classification in Python

It can be more flexible to forecast odds of an observation which belon pgs to every class in a classification problem instead of forecasting classes directly. This flexibility comes from the way that probabilities might be interpreted using differing thresholds that facilitate the operator of the model to trade-off concerns in the errors committed by the model, like the number of false positives contrasted to the number of false negatives. This is needed when leveraging models where the cost

#### How to calculate precision, Recall, F1, and more with regards to deep learning models

Upon fitting of a deep learning neural network model, you muswet assess its performance on an evaluation dataset. This is crucial, as the reported performance enables you to both select between candidate models and to communicate to stakeholders about how functional the model is at finding solutions to the problem. The Keras deep learning API model is really restricted in terms of the metrics that you can leverage to report the model performance.