The Chain Rule of Calculus – Additional Functions
The chain rule is a critical derivative rule that enables us to operate with composite functions. It is critical is comprehending the workings of the backpropagation algorithm, which is applicable to the chain rule extensively in order to calculate the error gradient of the loss function with regard to every weight of a neural network. We will be developing on our prior intro to the chain rule, by handling more challenging functions.
In this guide, you will find out how to go about applying the chain rule of calculus to challenging functions.
- The procedure of application of the chain rule to univariate functions can be extended to multivariate ones.
- The applying of the chain rule adheres to a similar procedure, regardless of how complicated the function is: take the derivative of the outer function to start with, and then shift inwards. Along the way, the applying of other derivative rules might be needed.
- Application of the chain rule to multivariate functions needs the leveraging of partial derivatives.
Tutorial Summarization
The tutorial is subdivided into two portions, which are:
- The chain rule on univariate functions
- The chain rule on multivariate functions
Prerequisites
For this guide, it is assumed that you are already acquainted with:
- Multivariate functions
- The power and product rules
- Partial derivatives
- The chain rule
The chain rule on univariate functions
We have already found out about the chain rule for univariate and multivariate functions, however we have only observed a few simple instances thus far. Let’s observe a bit more challenging ones here. We’ll be beginning with univariate functions to start with, and then go about applying what we learn to multivariate functions.
Instance 1: Let’s increase the bar by taking up the following composite function:
We can go about separating the composite function into the inner function, f(x) = x squared – 10, and the outer function, g(x) = √x = (x)1/2.
The output of the inner function is signified by the intermediate variable, u, and its value will be fed into the input of the outer function.
The first step is to identify the derivative of the outer portion of the composite function, while ignoring whatever is inside. For this reason, we can go about applying the power rule:
dh / du = (1/2) (x2 – 10)-1/2
The next step is to identify the derivative of the inner portion of the composite function, this time ignoring whichever is outside. We can go about applying the power rule here too.
du/dx = 2x
Bring the two portions together and through the process of simplification, we have:
Instance 2: Let’s repeat the process, this time with a differing composite function.
We will again leverage, u, the output of the inner function, as our intermediate variable.
The outer function in this scenario is, cos x. Identifying its derivative, once again ignoring the inside, provides us with:
dh / du = (cos(x cubed – 1))’ = -sin(x cubed minus one)
The inner function is x cubed – one. Therefore, its derivative becomes:
du / dx = (x cubed minus 1)’ = 3x squared
Bringing the two portions together, we get the derivative of the composite function.
Instance 3: Let’s now increase the bar a little bit more by taking up a more challenging composite function.
If we look at this closely, we realize that not only do we possess nested functions for which we will require to apply the chain rule several times, but we additionally have a product to which we will require to apply the product rule.
We identify that the outermost function is a cosine. In identifying its derivative through the chain rule, we shall be leveraging the intermediate variable, u:
dh / du = (cos(x √(x2 – 10) ))’ = -sin(x √(x2 – 10) )
Inside the cosine, we have the product, x √(x2 – 10), to which we will be performing the application of the product rule to identify its derivative (observe that we are always shifting from the outside to the inside, in order to find out the operation that requires to be handled next.
du / dx = (x √(x2 – 10) )’ = √(x2 – 10) + x ( √(x2 – 10) )’
One of the aspects in the outcome term is, ( √(x2 – 10) )’ to which we will performing application of the chain rule again. Indeed, we have already performed so above, and therefore, we can simply re-utilize the outcome.
( √(x2 – 10) )’ = x (x2 – 10)-1/2
Bringing all the portions together, we get the derivative of the composite function.
This can be streamlined further into:
The Chain Rule on Multivariate Functions
Instance 4: Let’s assume that we are now presented by a multivariate function of dual independent variables, s and t, with every one of these variables being dependent on another dual independent variables, x and y:
h = g(s, t) = s2 + t3
Where the functions s = xy, and t = 2x – y
Implementation of the chain rule here needs the computing of partial derivatives, as we are working with several independent variables. Further, s and t will additionally function as our intermediate variables. The formulae that we will be operating with, defined with regard to every input, are the following:
From these formulae, we can observe that we will require to identify six differing partial derivatives.
We can now go on to replace these terms in the formulae for ∂h / ∂x and ∂h / ∂y:
And subsequently replace for s and t to identify the derivatives:
Instance 5: Let’s do this again, this time with a multivariate function of a trio of independent variables, r, s, and t, with every one of these variables being dependent on another two independent variables, x and y:
h = g(r, s, t) = r2 – rs + t3
Where the functions, r = x cos y, s = x ey, and t = x + y.
This time, r, s, and t will function as our intermediate variables. The formulate that we will be operating with, defined with regard to every input, are the following:
From these formulae, we can observe that we will now require to identify nine partial differing partial derivatives.
Again, we move forward to replace these terms in the formulae for ∂h / ∂x and ∂h / ∂y
And subsequently replace for r, s and t to identify the derivatives:
Which might be streamlined a bit further (hint: apply the trigonometric identity 2sin cos y = sin 2y to ∂h / ∂y):
Regardless of how complicated the expression is, the process to follow stays similar:
Your last computation informs you the first thing to perform.
Therefore, begin by handling the outer function to start with, then shift inwards to the next one. You might require to go about applying other rules along the way, as we have observed for instance 3. Do not forget to take the partial derivatives if you are operating with multivariate functions.
Further Reading
This section furnishes additional resources on the subject if you seeking to delve deeper:
Books
Calculus for Dummies, 2016
Single and Multivariable Calculus, 2020
Mathematics for Machine Learning, 2020
Conclusion
In this guide, you found out about the chain rule of calculus to challenging functions.
Particularly, you learned:
- The procedure of application of the chain rule to univariate functions can be extended to multivariate ones.
- The application of the chain rule follows a similar procedure, regardless of how complicated the function is: obtain the derivative of the outer function to start with, and then shift inwards. Along the way, the application of other derivative rules might be needed.
- Application of the chain rule to multivariate functions needs the leveraging of partial derivatives.