>>September

Hessian matrices come from a categorization of mathematical structures that consist of second order derivatives. They are typically leveraged within machine learning and data science algorithms for optimization a function of interest. 

In this guide, you will find out all about Hessian matrices, their associated discriminants, and their importance. All ideas are illustrated with instances. 

After going through this guide, you will be aware of: 

  • Hessian matrices 
  • Discriminants computed through Hessian matrices 
  • What data is contained within the discriminant 

Tutorial Summarization 

This tutorial is subdivided into three portions, which are: 

1] Definition of a function’s Hessian matrix and the associated discriminant 

2] Instance of computing the Hessian matrix, and the discriminant 

3] What the Hessian and the discriminant inform us with regards to the function of interest 

Prerequisites 

For this guide, knowledge in the following topics are assumed: 

1] Derivatives of functions 

2] Function of several variables, partial derivatives and gradient vectors 

3] Higher order derivatives 

Just what exactly is a Hessian Matrix? 

The Hessian matrix is a matrix of second order partial derivatives. Assume we possess a function f of n variables, that is,  

f:R^n→R 

The Hessian of f is provided by the following matrix on the left. The Hessian for a function of two variables is also demonstrated here, below, on the right. 

We are already aware from our guide on gradient vectors that the gradient is a vector of first order partial derivatives. The Hessian is likewise, a matrix of second order partial derivatives formed from all pairings of variables within the domain of f.

What is the discriminant, then?

The determinant of the Hessian is also referred to as the discriminant of f. For a dual variable function f(x,y), it is provided by:

Instances of Hessian Matrices and Determinants

Assume we possess the following function:

g(x,y) = x^3 + 2y^2 + 3xy^2

Then the Hessian H_g and the discriminant D_g are provided by:

Let’s assess the discriminant at differing points:

D_g(0, 0) = 0

D_g(1,0) = 36 + 24 = 60

D_g(0,1) = -36

D_g(-1, 0) = 12

What do the Hessian and Discriminant indicate?

The Hessian and the associated discriminant are leveraged to determine the local extreme points of a function. Assessing them assists in the comprehension of a function of various variables. The following are some critical rules for a point (a,b) where the discriminant is D(a,b):

1] The function f possesses a local minimum if f_xx(a,b) > 0 and the discriminant D(a,b) > 0

2] The function f has a local maximum if f_xx(a, b) < 0 and the discriminant D(a,b) > 0

3] The function f possesses a saddle point if D(a,b) < 0

4] We cannot draw any conclusions if D(a,b) = 0 and require additional tests

Instance: g(x,y)

For the function g(x,y):

1] We cannot draw any conclusions for the point (0,0)

2] f_xx(1,0) = 6 > 0 and D_g(1,0) = 60 > 0, therefore (1,0) is a local minimum

3] The point (0,1) is a saddle point as D_g(0,1) < 0

4] f_xx(-1,0) = -6 < 0 and D_g(-1,0) = 12 > 0, therefore (-1,0) is a local maximum.

The figure here demonstrates a graph of the function g(x,y) and it associated contours.

Why is the Hessian Matrix critical within machine learning

The Hessian matrix has a critical part in several machine learning algorithms, which consist of optimization of a provided function. While it might be expensive to compute, it has some critical data with regards to the function being optimized. It can assist in determining the saddle points, and the local extremum of a function. It is leveraged extensively in training of neural networks and deep learning architectures.

Extensions

This section lists some concepts for extension of the tutorial that you may desire to explore:

1] Optimization

2] Eigen values of the Hessian matrix

3] Inverse of Hessian Matrix and neural network training

Further Reading

This section furnishes additional resources on the subject if you’re looking to delve deeper.

Concepts

Derivatives

Gradient descent for machine learning

What is gradient within machine learning

Partial derivatives and gradient vectors

Higher order derivatives

How to select an optimization algorithm

Books

Thomas Calculus, 14th Edition, 2017 (based on the original works of George B. Thomas, revised by Joel Hass, Christopher Heil, Maurice Weir)

Calculus, 3rd Edition, 2017 (Gilbert Strang)

Calculus, 8th Edition, 2015 (James Stewart)

Conclusion

In this guide, you found out about Hessian matrices. Particularly, you learned:

  • Hessian matrix
  • Discriminant of a function