### An intro to the Sigmoid Function

If you implement a neural network yourself or you leverage a built in library for neural network learning, it is of paramount criticality to comprehend the importance of a sigmoid function. The sigmoid function is critical to comprehending how a neural network learns complex problems. This function also served as a foundation for finding out other functions that lead to efficient and good solutions for supervised learning in deep learning architectures.

In this guide, you will find out about the sigmoid function and its part in learning from instances in neural networks.

After going through this guide, you will be aware of:

- The sigmoid function
- Linear vs non-linear separability
- Why a neural network can make complex decision boundaries if a sigmoid unit is leveraged

**Tutorial Summarization**

This guide is subdivided into three portions, which are:

1] The sigmoid function

- The sigmoid function and its attributes

2] Linear vs non-linearly separable problems

3] Leveraging a sigmoid as an activation function in neural networks

**Sigmoid Function**

The sigmoid function is a special form of the logistic function and is typically denoted by σ(x) or sig(x). It is provided by:

σ(x) = 1/(1+exp(-x))

**Properties and identities of Sigmoid Function**

The graph of sigmoid function is an S-shaped curve as demonstrated by the green line in the graph below. The figure also displays the graph of the derivative in pink colour. The expression for the derivative, combined with some critical attributes are displayed on the right.

A few other attributes consist of:

1] Domain: (-∞, +∞)

2] Range: (0, +1)

3] σ(0) = 0.5

4] The function is monotonically increasing.

5] The function is continuous everywhere.

6] The function is differentiable everywhere in its domain

7] Numerically, it is adequate to compute this function’s value over a minimal range of numbers, for example, [-10, +10]. For values lesser than -10, the function’s value is nearly zero. For values bigger than 10, the function’s values are nearly one.

__The Sigmoid as a Squashing Function__

The sigmoid function is also referred to as a squashing function as its domain is the grouping of all real numbers, and its range is (0,1). Therefore, if the input to the function is either a very big negative number or a really big positive number, the output is always between 0 and 1. Same goes for any numbers falling in between -∞ and +∞

__Sigmoid As An Activation Function in Neural Networks__

The sigmoid function is leveraged as an activation function in neural networks. Just to go through what an activation function is, the image below demonstrates the part of an activation function in a single layer of a neural network. A weighted sum of inputs is passed via an activation function and this output functions as an input to the subsequent layer.

When the activation function for a neuron is a sigmoid function it is a guarantee that the output of this unit will always be in between 0 and 1. Additionally, as the sigmoid is a non-linear function, the output of this unit would be a non-linear function of the weighted total of inputs. Such a neuron that deploys a sigmoid function as an activation function is referred to as a sigmoid unit.

__Linear v. Non-linear Separability?__

Assume that we possess a conventional classification problem, where we have a grouping of points in space and every point is allocated a class label. If a straight line (or a hyperplane in an n-dimensional space) can divide the dual classes, then we possess a linearly separable problem. On the other hand, if a straight line is not adequate to divide the dual classes, then we possess a non-linearly separable problem. The figure below demonstrates data in the 2D space. Every point is allocated a red or blue class label. The left figure demonstrates a linearly separable problem that needs a linear boundary to differentiate between the two classes. The image on the right demonstrates a non-linearly separable problem, where a non-linear decision boundary is needed.

For 3D space, a linear decision boundary can be detailed through the equation of a plane. For an n-dimensional space, the linear decision boundary is detailed by the equation of a hyperplane.

__Why The Sigmoid Function is Critical in Neural Networks__

If we leverage a linear activation function within a neural network, then this model can just learn linearly separable problems. However, with the inclusion of only one hidden layer and a sigmoid activation function in the hidden layer, the neural network can easily go about learning a non-linearly separable problem. Leveraging a non-linear function generates non-linear boundaries and therefore, the sigmoid function can be leveraged in neural networks for learning complex decision functions.

The only non-linear function that can be leveraged as an activation function within a neural network is one which is monotonically appreciating. So, for instance, sin(x) or cos(x) cannot be leveraged as activation functions. Additionally, the activation function should be defined everywhere and ought to be continuous everywhere in the space of real numbers. The function is also needed to be differentiable over the complete space of real numbers.

Conventionally a back propagation algorithm leverages gradient descent to go about learning the weights of a neural network. To obtain this algorithm, the derivative of the activation function is needed.

The fact that the sigmoid function is monotonic, continuous and differentiable everywhere, combined with the attribute that its derivative can be detailed in terms of itself, makes it simple to obtain the update equations for learning the weights in a neural network when leveraging back propagation algorithm.

__Extensions__

This section details some ideas for extension of the tutorial that you might desire to explore.

- Other non-linear activation functions, e.g. tanh function
- Rectified Linear Unit (ReLU)
- Deep learning

__Further Reading__

This portion of the blog provides additional resources on the subject if you seeking to delve deeper.

*Resources*

Jason Brownlee’s excellent resource on Calculus Books for Machine Learning

*Books*

Pattern recognition and machine learning by Christopher M. Bishop

Deep learning by Ian Goodfellow, Joshua Begio, Aaron Courville

Thomas Calculus, 14^{th} edition, 2017 (based on the original works of George B. Thomas, revised by Joel Hass, Christopher Heil, Maurice Weir)

__Conclusion__

In this guide, you found out all about what a sigmoid function is. Particularly, you learned:

- The sigmoid function and its attributes
- Linear v. non-linear decision boundaries
- Why including a sigmoid function at the hidden layer allows a neural network to learn complex non-linear boundaries