Applications of derivatives
The derivative defines the rate at which a singular variable modifies with regard to one another.
It is a critical concept that comes in really useful in several applications, in daily life, the derivative can inform you at which rate you are driving, or assist you in forecasting fluctuations on the stock market, in machine learning, derivatives are critical for function optimization.
This guide by AICoreSpot will look into various applications of derivatives, beginning with the more familiar ones prior to shifting to machine learning. We will be taking a deeper look at what the derivatives inform us about the differing functions we are studying.
In this guide, you will find out about differing applications of derivatives:
Upon going through this guide, you will know:
- The leveraging of derivatives can be applied to real-life issues that we discover around us
- The leveraging of derivatives is critical in machine learning, for function optimization.
Tutorial Overview
This tutorial is sub-divided into two parts, they are:
Applications of derivatives in real-life
Applications of derivatives in optimization algorithms
Applications of derivatives in real life
We have observed that derivatives model rates of change.
Derivatives provide solution to questions like “how quick?” “How steep?” and “How sensistive?” These are all questions with regards to rates of change in one shape or another.
This speed of change is signified by ?y / ?x, therefore defining a change in the dependent variable, ?y, with regards to a change in the independent variable ?x
Let’s begin with one of the most familiar applications of derivatives that we can identify around us.
When we state that a car is travelling at 100 kilometres an hour, we would have just specified its rate of change. The common term that we typically leverage is speed or velocity, even though it would be ideal that we first distinguish amongst the two.
In daily life, we typically leverage speed and velocity interchangeably if we are detailing the rate of alteration of a moving object. Although, this is not mathematically precise as speed is always positive, whereas velocity puts forth a notion of direction, and, therefore, can exhibit both positive and negative values. Therefore, in the ensuing explanation, we shall take up velocity as the more technical idea, defined as:
velocity = 𝛿y / 𝛿t
This implies that velocity provides the change in the vehicle’s position, within an interval of time. To put it in different words, velocity is the first derivative of a position with regards to time.
The car’s velocity can stay constant, like if the vehicle maintains a travel speed of 100 kmph constantly, or it can also modify as a function of time. In the latter scenario, this implies that the velocity function itself is altering as a function of time, or in easier terms, the vehicle can be stated to be accelerating. Acceleration is defined as the starting derivative of velocity, v, and the second derivative of position, y, with regards to time.
acceleration = 𝛿v / 𝛿t = 𝛿2y / 𝛿t2
We can graph the position, velocity and acceleration curves to visualize them in a better fashion. Assume that the car’s position, as a function of time, is given by y(t) = t3 – 8t2 + 40t:
The graph signifies that the vehicle’s position alters slowly at the start of the journey, slowing down a bit until around t = 2.7s, at which point its speed of change picks up and goes on increasing until the conclusion of the journey. This is illustrated by the graph of the vehicles velocity:
Observe that the vehicle maintains a positive velocity across the journey, and this is as it never alters direction. Therefore, if we had to visualize ourselves sitting in this travelling car, the speedometer would be depicting to us the values that we have just plotted on the velocity graph (as the velocity stays positive throughout, else we would have to identify the absolute value of the velocity to work out the speed). If we had to go about applying the power rule to y(t) to identify its derivative, then we would discover that the velocity is defined by the subsequent function:
v(t) = y’(t) = 3t2 – 16t + 40
We can additionally plot the acceleration graph:
We discover that the graph is currently characterized by negative acceleration in the time interval, t=[0, 2.7) seconds. This is owing to the fact acceleration is the derivative of velocity, and inside of this time interval the vehicle’s velocity is reducing. If we had to again, apply the power rule to v(f) to identify its derivative, then we would identify the acceleration is defined by the following function.
a(t) = v’(t) = 6t – 16
Bringing all functions together, we have the following.
y(t) = t3 – 8t2 + 40t
v(t) = y’(t) = 3t2 – 16t + 40
a(t) = v’(t) = 6t – 16
If we replace for t = 10s, we can leverage these three functions to identify that by the conclusion of the journey, the vehicle has traversed 600m, its velocity is 180 m/s, and it is accelerating at 44 m/s squared. We can go about verifying that all of these values correlate with the graphs that we have just now plotted.
We have framed this specific instance with the context of identifying a vehicle’s velocity and acceleration. However, there is a plethora of actual-life phenomena that alter with the passage of time (or variables other than time), which can be researched through application of the concept of derivatives as we have just performed for this specific instance. To name a few:
- Growth rate of a population (be it a grouping of humans, or a colony of bacteria) over time, which can be leveraged to forecast changes in population size in the immediate future.
- Modifications in temperature as a function of location, which can be leveraged for weather forecasting
- Fluctuations of the stock market with the passage of time, which can be leveraged to forecast future stock market behaviour.
Derivatives also furnish critical data in finding solutions to optimization issues, as we will now be seeing.
Applications of derivatives in Optimization Algorithms
We have already observed that an optimization algorithm, like gradient descent, looks to reach the global minimum of an error (or cost) function through application of the use of derivatives.
Let’s take a closer view of what the derivatives inform us with regards to the error function, by going through the same exercise as we have performed for the car instance.
For this reason, let’s look at the following 1D test function with regards to function optimization:
f(x) = –x sin(x)
We can go about applying the product rule to f(x) to identify its first derivative, signified by f(x), and then again applying the product rule to f(x) to identify the second derivative, signified by f(x)
f(x) = -sin(x) – x cos(x)
f(x) = x sin(x) – 2 cos(x)
We can go about plotting these three functions for differing values of x to visualize them.
Like what we have observed prior for the car instance, the graph of the first derivative signifies how f(x) is altering and by how much. For instance, a positive derivative signifies that f(x) is an increasing function, whereas a negative derivative informs us that f(x) is currently decreasing. Therefore, if in its search for a function minimum, the optimization algorithm executes small changes to the input on the basis of its learning rate
x_new = x – ε f’(x)
Then the algorithm can minimize f(x) by shifting to the opposite direction by inverting the sign of the derivative.
We might also be fascinated in identifying the second derivative of a function.
We can think of the second derivative as measuring curvature.
For instance, if the algorithm comes at a crucial point at which the first derivative is zero, it cannot differentiate between this point being a local maximum, a local minimum, a saddle point or a flat region on the basis of f(x) alone. Although, when the second derivative intervenes, the algorithm can inform that the critical point in question is a local minimum if the second derivative is bigger than zero. For a local maximum, the second derivative is lesser than zero. Therefore, the second derivative can inform the optimization algorithm on which direction to shift. Unluckily, this test stays inconclusive with regards to saddle points and flat regions, for which the second derivative is nil in both scenarios.
Optimization algorithms on the basis of gradient descent do not leverage second order derivatives and are, thus, referred to as first-order optimization algorithms. Optimization algorithms, like Newton’s method, that exploit the leveraging of second derivatives, are otherwise referred to as second-order optimization algorithms.