An intro to partial derivatives and gradient vectors
Partial derivatives and gradient vectors are leveraged very frequently in machine learning algorithms for identifying the minimum or maximum of a function. Gradient vectors are leveraged in the training of neural networks, logistic regression, and several other classification and regression issues.
In this guide by AICoreSpot, you will find out all about partial derivatives and the gradient vector.
After going through this guide, you will know:
- Functions of various variables
- Level sets, contours and graphs of a function of two variables
- Partial derivatives of a function of various variables
- Gradient vector and its meaning
This tutorial is subdivided into three portions, they are:
- Function of various variables
- Level sets
- Definition of partial derivatives
- Gradient vector
- What does the gradient vector symbolize
A function of various variables
This section explores additional details about the functions of various variables here:
A function of various variables has the following attributes:
- Its domain is a group of n-tuples provided by (x_1, x_2, x_3, … x_n)
- Its range is a group of real numbers
For instance, the following is a function of two variables (n=2)
f_1(x,y) = x + y
In the above-listed function x and y are the independent variables. Their total decides the value of the function. The domain of this function is the set of all points on the XY cartesian plane. The plot of this function would require plotting within the 3D space, with two axes with regards to input points (x,y) and the third indicating the values of f.
Here is another instance of a function of two variables f_2 (x,y) = x*x + y*y
To keep things simple, we’ll do instances of functions of two variables. Obviously, in machine learning you’ll face functions of hundreds of variables. The concepts in relation to functions of two variables can be extended to those scenarios.
Level sets and graph of a function of two variables
The group of points on a plane, where a function f(x,y) has a constant value, that is, f(x,y) = c is the level set or level curve of f.
As an instance, for function f_1, all (x,y) points that fulfil the equation below define a level set for f_1:
x + y = 1
We can observe that this level set has a limitless set of points, for example, (0,2) (1,1) (2,0) etc, This level set defines a straight line in the XY plane.
Generally, all level sets of f_1 define straight lines of the form (c is any real constant)
x + y = c
Similarly, for function f_2, an instance of a level set is:
x*x + y*y = 1
We can observe that any point that lies on a circle of radius 1 with centre (0,0) fulfils the above expression. Therefore, this level set is made up of all points that lie on this circle. Likewise, any level set of f_2 fulfils the following expression (c is any real constant >=0)
x*x + y*y = c
Therefore, all level sets of f_2 are circles with centre at (0,0), every level set possessing its own radius.
The graph of the function f(x,y) is the set of all points (x,y,f(x,y)). It is also referred to as a surface z=f(x,y). The graphs of f_1 and f_2 are demonstrated underneath:
Contours of a function of two variables
Let’s assume we possess a function f(x,y) of two variables. If we cut the surface z=f(x,y) leveraging a plane z=c then we obtain the set of all points that fulfil f(x,y) = c. The contour curve is the set of points that fulfil f(x,y) = c, in the plane z=c. This is a bit different from the level set, where the level curve is directly defined on the XY plane. Although, several books treat contours and level curves as the same.
The contours of both f_1 and f_2 are demonstrated in the above figure.
Partial derivatives and gradients
The partial derivative of a function f w.r.t. the variable x is signified by ∂f/∂x. Its expression can be decided by distinguishing f w.r.t. For instance with regards to the functions f_1 an f_2, we have:
∂f_1/∂x = 1
∂f_2/∂x = 2x
∂f_1/∂x indicates the rate of change of f_1 w.r.t. x. For any function f(x,y), ∂f/∂x indicates the rate of change of f w.r.t. variable x.
Likewise is the scenario for ∂f/∂y. It indicates the rate of change of f w.r.t y.
When we discover the partial derivatives w.r.t all independent variables, we wind up with a vector. This vector is referred to as the gradient vector of f indicated by ∇f(x,y). A general expression for the gradients of f_1 and f_2 are provided by (here i,j are unit vectors parallel to the coordinate axis):
∇f_1(x,y) = ∂f_1/∂xi + ∂f_1/∂yj = i+j
∇f_2(x,y) = ∂f_2/∂xi + ∂f_2/∂yj = 2xi + 2yj
From the general expression of the gradient, we can assess the gradient at differing points in space. In the scenario of f_1 the gradient vector is a constant, i.e.
Regardless of where we are in the 3D space, the direction and magnitude of the gradient vector stays unmodified.
For the function f_2, ∇f_2(x,y) alters with values of (x,y). For instance, at (1,1) and (2,1) the gradient of f_2 is provided by the following vectors:
∇f_2(1,1) = 2i + 2j
∇f_2(2,1) = 4i + 2j
What does the gradient vector at a point signify?
The gradient vector of a function of various variables at any point indicates the direction of maximum rate of change.
We can relate the gradient vector to the tangent line. If we are standing at a point in space and we come up with a rule that informs us to walk along the tangent to the contour at that point. It implies regardless of where we are, we identify the tangent line to the contour at that point and walk along it. If we walk adhering to this rule, we’ll wind up walking along the contour of f. The function’s value will never alter as the function’s value is constant on the contour of f.
The gradient vector, is normal to the tangent line and indicates to the direction of maximum rate of increase. If we walk along the direction of the gradient we’ll begin facing the next point where the function’s value would be bigger than the prior one.
The positive direction of the gradient signifies the direction of maximum rate of increase, while, the negative direction signifies the direction of maximum rate of decrease. The following figure demonstrates the positive direction of the gradient vector at differing points of the contours of function f_2. The direction of the positive gradient is signified by the red arrow. The tangent line to a contour is demonstrated in green.
Why is the gradient vector critical in machine learning?
The gradient vector is really critical and leveraged frequently in machine learning algorithms. In classification and regression issues, we typically define the mean square error function. Following the negative direction of the gradient of this function will be the reason behind us identifying the point where this function possesses a minimum value.
Likewise is the scenario for functions, where maximization of them leads to accomplishing maximum precision. In this scenario, we’ll adhere to the direction of the maximum rate of increase of this function or the positive direction of the gradient vector.
This sections details some concepts for extending this guide that you might desire to explore:
- Gradient descent/gradient ascent
- Hessian matrix