Deep Learning 101 Part 1
Deep Learning. Neural Networks. Weights. The human brain reconstructed. The first three terms might be alien to many. The fourth sentence is an extremely simplified way of understanding deep learning. This blog will expand on that thought, demystifiying the nascent technology and get you up to speed on what the fuss is all about.
What is deep learning? What are neural networks? And how can the human brain, an unbelievably complex structure of synapses, neurons, and cells, ever be reconstructed? Let’s find out.
The basic premise behind deep learning is a simple one. Prior machine learning models had certain plaetaus, basically, once you’ve input a certain amount of learning into the system, the performance stagnates at a certain level. Deep learning doesn’t have this limitation, hence the name ‘deep’. Deep learning loads are benefitted by larger data sets, in essence, more the data that you input, better the performance.
This ‘depth’ in deep learning permits the expression of complicated representations. Breakthroughs in image recognition for example, have been because of deep learning, where simple features are aggregated to higher and higher levels until a classification is arrived at.
As the pixels are read, the intricacies of an image is detected by the system eventually leading to full recognition. Natural language processing is another domain where deep learning is making significant in roads. Deep learning is largely relevant to issues within predictive analytics, with a far reaching scope.
The Silicone Brain
Deep learning, is essentially a simulated representation of human brains. The silicone brain, for lack of a better phrase.
The architecture of a neural network is the central lynchpin of its performance and we’re currently spoilt for choice. However, all of these available architectures feature similar underlying principles; at its most fundamental level, they have an output layer, an input layer, a hidden layer, each with several nodes. Weights are comparable to our brain’s synapses.
Neurons in our brains make use of electricity to transmit data between themselves. The amount of data that can pass through these neurons directly corresponds to how strong or weak each of these neurons is.
In a basic sense, let us look at touching an open electricity outlet. One might receive a mild electric shock when this is done. The nerves in our body communicate information in the form of biodata through our neurons. The pathway between the neurons that perceive the immediate shock and pain in our brain and the neurons in our hand/fingers are immediately activated. And our brain learns that touching open electrical sokets is probably a bad idea. This neural pathway between the brain and the hand is strong, and will get stronger with each subsequent electrical shock – until our behaviour eventually changes to exhibit caution around open electricity. This is a simple way of how learning happens in our brain.
As we know, ‘learning’ can be define as any relatively permanent change in behaviours as the result of experience. Our brains at a fundamental level deal with consequences, it is concerned with increasing positive consequences and decreasing negative consequences. This is a very basic explanation of how learning takes place – but it is important to understand before we proceed further.
Artificial neural networks (ANNs) are composed of multiple nodes in several layers. A layer can have any number of nodes, and there is no limitation to how many layers a neural network can conain.
As you can see in the image above, the layers are densely interconnected. They are present for each node in the first layer and for every other node in the second layer. These are referred as the ‘weights’. As we saw earlier, weights are comparable to our brain’s synapses. Synapses, esentially facilitate communication between our brain’s neurons by being a conductor of electrical impulses in between neurons.
The ‘weights’ are a fundamental reflection of how critical the input is.
Let’s dissect this at an even simpler level. Assume you’re in the market for a motorcycle. The more recent the model, the pricier it is bound to be. Similarly, the more it has been used, the cheaper it’s going to be. Therefore, there is a positive relationship between a later year of manufacture – higher the year, higher the price, and a negative relationship with how much it has been used – more usage, lesser the price.
Through a simple formula,
Price of the motorbike = w₁ x year + w₂ x miles
w₁ is +ve and w₂ is -ve.
Let’s dig into the slightly more technical stuff.
In this scenario, we take a look at a single node present in the network. All the values from the previous layer are considered connecting to another node in the subsequent layer.
Y is the output value.
W indicates the weights in between the nodes in the prior layer and in the output node.
X indicates the node values of the prior layer.
B indicates bias an extra value included for every neuron. Bias is basically a weight with no input term. It’s utility is providing some adjustability without depending on a prior layer.
H is the intermediate node value. Do note that this is not the final value of the node.
f() is referred to as an activation function and it can be chosen.
In the end, the output value of this node will be f(0.57)
Bias? What’s a bias?
Bias, in layman’s terms, is a persistent value, a constant vector, that is an addition to the products of inputs and weights. Bias is deployed to offset the outcomes.
Imagine you want a neural network to return 5 when the input is 0. When the product of weight and input is a 0, adding a bias of 5 will ensure that the network returns 5.
The function of bias minimizes variance and therefore brings in flexibility and improved generalization to the neural network.
…and activation functions?
These are esentially mathematical functions that consider inputs and create an output. This function activates when the computed outcome attains a particular treshold.
Keeping it simple, activation functions are mathematical operations that normalize inputs and create outputs. The output is then transmitted to neurons on the next layer.
Activation function tresholds, ae pre-defined number values. They can impart non-linearity to outputs. As a result, this makes neural networks resolve non-linear problems. Non-linear problems can be defined as problems where there is no straightforward relationship or connection in between the output and the input.
Let’s understand the various types of activation functions in the next part of this article.