An intro to multivariate calculus
It is typically desirable to research functions that are dependent on several variables.
Multivariate calculus furnishes us with the utilities to do so through extension of the concepts that we identify in calculus, like the computation of the rate of change, to several variables. It has an essential part to play in the procedure of training a neural network, where the gradient is leveraged extensively to go about updating the model parameters.
In this tutorial, you will learn all about multivariate calculus, with a brief introduction.
After going through this guide, you will be aware of:
- A multivariate function is dependent on various input variables to generate an output
- The gradient of a multivariate function is computed through the discovery of the derivative of the function in differing directions.
- Multivariate calculus is leveraged extensively in neural networks to go about updating the model parameters.
This tutorial is subdivided into two portions, they are:
- Revisiting the concept of function
- Derivatives of multivariate functions
- Application of multivariate calculus in machine learning
Revisiting the idea of a function
A function is a rule that defines the relationship between a dependent variable and an independent variable. We have observed that a function is typically represented by y = f(x), where both the input (or the independent variable), x, and the output (or the dependent variable), y, are singular real numbers.
Such a function that takes a singular, independent variable and defines a one-to-one mapping amongst the input and output is referred to as a univariate function.
For instance, let’s state that we are making an effort to predict the weather on the basis of the temperatures only. In this scenario, the weather is considered to be the dependent variable that we are attempting to predict, which is a function of the temperature as the input variable. Such an issue can, thus, be simple to frame into a univariate function.
But, let’s state that we now desire to base our weather predictions on the humidity level and the wind speed as well, in addition to the temperature. We cannot do this through a univariate function, where the output is dependent on just a singular input.
Therefore, we shift our focus to multivariate functions, referred to as such as these functions can take various variables as input.
Formally, we can express a multivariate function as a mapping amongst various real input variables, n to a real output:
For instance, take up the following parabolic surface:
f(x, y) = x2 + 2y2
This is a multivariate function that takes two variables, y and x, as input, therefore n = 2, to generate an output. We can go about visualizing it through graphing its values for x and y between -1 and 1.
Likewise, we can possess multivariate functions that take more variables as input. Visualization of them, however, can be tough owing to the number of dimensions that are involved.
We can even go about generalizing the idea of a function by taking up functions that map several inputs, n, to various outputs, m:
These functions are usually referred to as vector-valued functions.
Derivatives of multi-variate functions
Remember that calculus deals with the research of the rate of change. For some univariate function, g(x), this can be accomplished through computation of its derivative.
The generalization of the derivative to functions of various variables is the gradient.
The strategy to identifying the gradient of a function of various variables consists of variation of every one of the variables at a time, while keeping the others consistent. In this fashion, we would be taking the partial derivative of our multivariate function with regards to every variable, every time.
The gradient is then the collection of these partial derivatives.
To go about visualizing this strategy in a better way, let’s begin by taking up a simplistic univariate quadratic function of the form:
g(x) = x squared
Identifying the derivative of this function at some juncture, x, needs the application of the equation for g(x) that we have gone about defining earlier. We can, alternatively, take up a shortcut by leveraging the power rule to discover that:
Further, if we had to visualize slicing open the parabolic surface contemplated prior, with a plane going through y = 0, we realise that the outcome cross-section of f(x,y) is the quadratic curve, g(x) = x squared. Therefore, we can go about calculating the derivative (or the steepness, or slope) of the parabolic surface in the direction of x, by taking up the derivative of f(x,y) but keeping y consistent. We call this the partial derivative of f(x,y) with regards to x, and denote it by ∂ to indicate that there are more variables in addition to x but these are not being considered for the time. Thus, the partial derivative with regards to x of f(x,y) is:
We can likewise hold x constant (or, in different words, identify the cross-section of the parabolic surface through slicing it with a plane going through a constant value of x) to identify the partial derivative of f(x,y) with regards to y, as follows:
What we have basically done is that we have discovered the univariate derivative of f(x,y) in each of the x and y directions. Bringing together the two univariate derivatives as the last step, provides us the multivariate derivative (or the gradient)
The same strategy stays valid for functions of higher dimensions.
Application of multivariate calculus in machine learning
Partial derivatives are leveraged extensively within neural networks to update the model parameters (or weights).
We had observed that, in reducing some error function, an optimization algorithm will look to follow its gradient downhill. If this error function was univariate, and therefore a function of a singular independent weight, then optimization of it would merely consist of computation of its univariate derivative.
But, a neural network consists of several weights (each one attributed to a differing neuron) of which the error is a function. Therefore, updation of the weight value necessitates that the gradient of the error curve is calculated with regards to all of these weights.
This is where the applying of multivariate calculus comes into play.
The gradient of the error curve is quantified by identifying the partial derivative of the error with regard to every weight, or in other words, identifying the derivative of the error function through keeping all weights constant except the one being considered. This facilitates each weight to be updated independently of the others, to attain the objective of identifying an optimal set of weights.