Hessian matrices come from a categorization of mathematical structures that consist of second order derivatives. They are typically leveraged within machine learning and data science algorithms for optimization a function of interest.

In this guide, you will find out all about Hessian matrices, their associated discriminants, and their importance. All ideas are illustrated with instances.

After going through this guide, you will be aware of:

- Hessian matrices
- Discriminants computed through Hessian matrices
- What data is contained within the discriminant

**Tutorial Summarization**

This tutorial is subdivided into three portions, which are:

1] Definition of a function’s Hessian matrix and the associated discriminant

2] Instance of computing the Hessian matrix, and the discriminant

3] What the Hessian and the discriminant inform us with regards to the function of interest

**Prerequisites**

For this guide, knowledge in the following topics are assumed:

1] Derivatives of functions

2] Function of several variables, partial derivatives and gradient vectors

3] Higher order derivatives

**Just what exactly is a Hessian Matrix?**

The Hessian matrix is a matrix of second order partial derivatives. Assume we possess a function f of n variables, that is,

f:R^n→R

The Hessian of f is provided by the following matrix on the left. The Hessian for a function of two variables is also demonstrated here, below, on the right.

We are already aware from our guide on gradient vectors that the gradient is a vector of first order partial derivatives. The Hessian is likewise, a matrix of second order partial derivatives formed from all pairings of variables within the domain of f.

__What is the discriminant, then?__

The determinant of the Hessian is also referred to as the discriminant of f. For a dual variable function f(x,y), it is provided by:

__Instances of Hessian Matrices and Determinants__

Assume we possess the following function:

g(x,y) = x^3 + 2y^2 + 3xy^2

Then the Hessian H_g and the discriminant D_g are provided by:

Let’s assess the discriminant at differing points:

D_g(0, 0) = 0

D_g(1,0) = 36 + 24 = 60

D_g(0,1) = -36

D_g(-1, 0) = 12

__What do the Hessian and Discriminant indicate?__

The Hessian and the associated discriminant are leveraged to determine the local extreme points of a function. Assessing them assists in the comprehension of a function of various variables. The following are some critical rules for a point (a,b) where the discriminant is D(a,b):

1] The function f possesses a local minimum if f_xx(a,b) > 0 and the discriminant D(a,b) > 0

2] The function f has a local maximum if f_xx(a, b) < 0 and the discriminant D(a,b) > 0

3] The function f possesses a saddle point if D(a,b) < 0

4] We cannot draw any conclusions if D(a,b) = 0 and require additional tests

__Instance: g(x,y)__

For the function g(x,y):

1] We cannot draw any conclusions for the point (0,0)

2] f_xx(1,0) = 6 > 0 and D_g(1,0) = 60 > 0, therefore (1,0) is a local minimum

3] The point (0,1) is a saddle point as D_g(0,1) < 0

4] f_xx(-1,0) = -6 < 0 and D_g(-1,0) = 12 > 0, therefore (-1,0) is a local maximum.

The figure here demonstrates a graph of the function g(x,y) and it associated contours.

__Why is the Hessian Matrix critical within machine learning__

The Hessian matrix has a critical part in several machine learning algorithms, which consist of optimization of a provided function. While it might be expensive to compute, it has some critical data with regards to the function being optimized. It can assist in determining the saddle points, and the local extremum of a function. It is leveraged extensively in training of neural networks and deep learning architectures.

__Extensions__

This section lists some concepts for extension of the tutorial that you may desire to explore:

1] Optimization

2] Eigen values of the Hessian matrix

3] Inverse of Hessian Matrix and neural network training

__Further Reading__

This section furnishes additional resources on the subject if you’re looking to delve deeper.

*Concepts*

Derivatives

Gradient descent for machine learning

What is gradient within machine learning

Partial derivatives and gradient vectors

Higher order derivatives

How to select an optimization algorithm

*Books*

Thomas Calculus, 14^{th} Edition, 2017 (based on the original works of George B. Thomas, revised by Joel Hass, Christopher Heil, Maurice Weir)

Calculus, 3^{rd} Edition, 2017 (Gilbert Strang)

Calculus, 8^{th} Edition, 2015 (James Stewart)

__Conclusion__

In this guide, you found out about Hessian matrices. Particularly, you learned:

- Hessian matrix
- Discriminant of a function

#### An intro to Hessian Matrices

Hessian matrices come from a categorization of mathematical structures that consist of second order derivatives.

#### Gradient Descent with AdaGrad from the ground up

Function optimisation is a domain of study that looks for an input to a function that has the outcome of the maximum or minimum output of the function.

#### Gradient Descent Optimisation with AMSGrad from the ground up

Gradient descent is an optimisation algorithm that follows the negative gradient of an objective function in order the situate the minimum of the function.

#### Gradient Descent Optimisation with AdaMax From the ground up

Gradient descent is an optimisation algorithm that follows the negative gradient of an objective function in order to situate the minimum of that function.

#### 1D Test Functions for Function Optimisation

Function optimisation is a domain of study that looks for an input to a function that has the outcome of the maximum or minimum output of the function.

#### AI is the answer to combat climate change

To assist humanity in rectifying their actions against the planet’s environment, artificial intelligence will facilitate us. Climate change is the earth’s biggest hurdle and artificial intelligence can facilitate us in the war against escalating planetary temperature levels.

#### An intro to Premature Convergence

Convergence is a reference to the limit of a process and can be a good analytical utility when assessing the forecasted performance of an optimization algorithm.

#### An intro to Function Optimization

Function optimisation is a basic sphere of research and study and the strategies are leveraged in nearly every quantitative domain.

#### Calculus Pre-Requisites – A Primer

We have prior observed that calculus is one of the fundamental mathematical ideas within machine learning that enables us to comprehend the inner workings of differing machine learning algorithms.

#### Calculus within machine learning – why it’s a good fit

Calculus is one of the fundamental mathematical ideas within machine learning that enables us to comprehend the inner workings of differing machine learning algorithms.