### Calculus Pre-Requisites – A Primer

We have prior observed that calculus is one of the fundamental mathematical ideas within machine learning that enables us to comprehend the inner workings of differing machine learning algorithms.

Calculus, in turn, adds on to various basic ideas that come from algebra and geometry. The criticality of having these basics at hand will turn even more vital as we work our way through more sophisticated subjects of calculus, like the evaluation of limits and the computation of derivatives, to specify a few instances.

In this guide, you will find out about the various pre-requisites that will assist you in working with calculus.

After going through this guide, you will be aware of:

- Linear and non-linear functions which are central to calculus and machine learning, and several calculus problems that consist of their usage.
- Basic concepts from algebra and trigonometry furnish the base for calculus, and will become particularly important as we handle more sophisticated calculus topics.

__Tutorial Summarization__

This tutorial is subdivided into two portions, which are:

- The idea of a function
- Basics of Pre-Algebra and Algebra
- Fundamental of Trigonometry

__The idea of a function__

A function is rule that goes about defining the relationship amongst a dependent variable and an independent variable.

Instances are all around us. The mean daily temperature for the place you live in is dependent on, and is a function of, the time of year, the distance an object has fallen is a function of how much time has elapsed since you dropped it, the area of a circle is a function of its radius, and the pressure of an enclosed gas is a function of its temperature.

Within machine learning, a neural network goes about learning a function by which it can signify the relationship amongst features in the input, the independent variable, and the expected output, the dependent variable. In such a situation, thus, the learned function defines a deterministic mapping amongst the input values and one or more output values. We can indicate this mapping as follows:

Output(s) = function(Inputs)

More formally, though, a function is often indicated by y=f(x), which translates to y is a function of x. This notation mentions x as the independent input variable that we are already aware of, while y is the dependent output variable that we desire to find. For instance, if we take up the squaring function f(x) = x squared, then inputting a value of 3 would generate an output of 9.

y=f(3) = 9.

A function can additionally be indicated pictorially through a graph on an x-y coordinate plane.

By the graph of the function f we mean the collection of all points (x,f(x))

When graphing a function, the independent input variable is situated on the y-axis, while the dependent output variable goes on the y-axis. A graph assists in illustrating the relationship amongst the independent and the dependent variables in a better way: is the graph (and, therefore, the relationship) rising or falling, and at what rate?

A straight line is one of the easiest functions which can be graphed on the coordinate plane. Take, for instance, the graph of the line y = 3x+5.

This straight line can be detailed through a linear function, so referenced as the output modifies proportionally to any modification in the input. The linear function that details this straight line can be indicated in slope-intercept form, where the slope is signified by m, and the y-intercept by c.

f(x) = mx + c = 3x + 5

We had observed how to calculate the slope when we are tackling the topic of Rate of Change.

If we had to take up the special scenario of setting the slope to zero, the outcome horizontal line would be detailed by a constant function of the form.

f(x) = c = 5

Within the scope of machine learning, the calculation defined by a linear function of this type is implemented by each neuron within a neural network. Particularly, every neuron obtains a set of n inputs, *x _{i }*from the prior layer of neurons or through the training data, and calculates a weighted total of these inputs (where the weight is more common term with regards to the slope, m, in machine learning) to generate an output, z:

The procedure of training a neural network consists of learning the weights that ideally represent the patterns within the input dataset, which process is executed by the gradient descent algorithm.

On top of the linear function, there exists another grouping of non-linear functions.

The simplest of all non-linear functions can be taken up to be the parabola, that may be detailed by:

y = f(x) = x squared

When graphed, we identify that this is an even function, as it is symmetric about the y-axis, and never falls underneath the x-axis.

Nevertheless, non-linear functions can take several differing shapes. Take up, for example, the exponential function of the form f(x) = *b ^{x }*which grows or decays indefinitely, or monotonically, dependent on the value of x:

Or the logarithmic function of the form f(x) = log squared x, which is similar to the exponential function but with the x- and y-axes swapped.

Of specific interest for deep learning are the logistic, tanh, and the rectified linear units (ReLU) non-linear functions, which function as activation functions:

The criticality of these activation functions resides in the introduction of a non-linear mapping into the process of a neuron. If we had to reliant only on the linear regression executed by every neuron in calculation of a weighted total of the inputs, we would then be limited to learning just a linear mapping from the inputs to the outputs. But, several real-world relationships are more complicated than this, and a linear mapping would not precisely model them. Putting forth a non-linearity to the output, z, of the neuron, facilitates the neural network to model these non-linear relationships.

Output = activation_function(z)

A neuron the basic building block of neural networks and deep learning is defined through a simple 2-step sequence of operations: calculating a weighted total and then passing the outcome through an activation function.

Non-linear functions prop up elsewhere in the procedure of training a neural network too, in the form of error functions.

A non-linear error function can be produced through calculation of the error amongst the forecasted and the target output values as the weights of the model alter. Its shape can be as simplistic as a parabola, however, most typically it is personified by several local minima and saddle points. The gradient descent algorithm descends this non-linear error calculation through calculation of the slope of the tangent line that touches the curve at some specific instance: another critical idea in calculus that enables us to undertake analysis of complicated curve functions by slicing them into several infinitesimal straight pieces arranged alongside each other.

__Basics of Pre-Algebra and Algebra__

Algebra is one of the critical foundations of calculus:

Algebra is in simple terms, the language of calculus. You cannot perform calculus without being adept at algebra any more than you can write Japanese poetry without being proficient in Japanese.

There are various basic concepts of algebra that happen to be useful for calculus, like those concerned with fractions, powers, square roots, and logarithms.

Let’s begin by revising the fundamentals for working with fractions.

- Division by Zero: The denominator of a fraction can never be equivalent to zero. For instance, the outcome a fraction such as 5/0 is undefined. The intuition behind this is that you can never add up the value within the numerator, leveraging multiples of zero in the denominator.
- Reciprocal: The reciprocal of a fraction is its multiplicative inverse. In layman’s terms, to identify the reciprocal of a fraction, turn it upside down. Therefore, the reciprocal of ¾ for examples, turns 4/3.
- Multiplication of Fractions: Multiplication amongst fractions is as direct as multiplication across the numerators, and multiplication across the denominators.

(a/b) * (c / d) = ac/bd

- Division of Fractions: This is very much like multiplication, however with an extra step, the reciprocal of the second fraction is first identified prior to multiplication. Therefore, considering again two generic fractions:

(a / b) divided by (c / d) = (ad + cb) / bd

- Subtraction of Fractions: The subtraction of fractions adheres to a similar procedure as for the addition of fractions.

(a / b) – (c / d) = (ad – cb) / bd

- Cancelling in Fractions: Fractions with an unbroken link of multiplications across the entire numerator, in addition to across the entire denominator, can be simplified through cancelling out any common terms that prop up in both the numerator and the denominator.

a^{3}b^{2} / ac = a^{2}b^{2} / c

The next critical pre-requisite for calculus is concerned with exponents, or powers as they are also usually reference to. There are various rules to remember when operating with powers as well.

- The Power of Zero: The outcome of any number (regardless of it is rational or irrational, negative of positive, except for zero itself) raised to the power of zero, is equivalent to one.

x to the power of 0 is equivalent to one.

- Negative Powers: A base number raised to a negative power turns into a fraction, but does not alter sign.

x^{-a} = 1 / x^{a}

- Fractional Powers: A base number raised to a fractional power can be converted into a root problem.

x^{a/b} = (^{b}√x)^{a} = ^{b}√x^{a}

- Addition of Powers: If two (or more) equivalent base terms are being multiplied to each other, then their powers might be aadded.

x^{a} * x^{b} = x^{(a + b)}

- Subtraction of Powers: Likewise, if two equivalent base terms are being divided, then their power might be subtracted.

x^{a} / x^{b} = x^{(a – b)}

- Power of Powers: If a power is also raised to a power, then the two powers might be multiplied by each other.

(x^{a})^{b} = x^{(ab)}

- Distribution of Powers: Regardless of if the base numbers are being multiplied or divided, the power might be distributed to every variable. But, it cannot be distributed if the base numbers are, otherwise, being added or subtracted.

(xyz)^{a} = x^{a }y^{a }z^{a}

(x / y)^{a = }x^{a }/ y^{a}

Likewise, we possess rules for working with roots and rules for working with logarithms.

Lastly, knowing how to find solutions to quadratic equations can also be a nifty skills in calculus.

If the quadratic equation is factorable, then the simplest strategy to find a solution to it is to express the sum of terms in product form. For instance, the following quadratic equation can be factored as follows:

x squared – 9 = (x+3)(x-3) = 0

Setting every factor to nil enables us to identify a solution to this equation, which in this scenario is x = plus or minus 3.

Alternatively, the following quadratic formula can be leveraged:

If we had to take up the same quadratic equation as above, then we would set the coefficient values to a=1, b=0, and c=9, which would again have the outcome of *x* = ±3 as our answer.

__Basics of Trigonometry__

Trigonometry is concerned with three primary trigonometric functions, which are the sine, the cosine, and the tangent, and their reciprocals, which are the cosecant, the secant and the cotangent, respectively.

When applied to a right angled triangle, these three primary functions enables us to quantify the lengths of the sides, or any of the other two acute angles of the triangle, dependent on the data that we have available to begin with. Particularly, for some angle, x, in the following 3-4-5 triangle:

The cosine, sine, and tangent functions only function with right-angled triangles, and therefore can only be leveraged in the calculation of acute angles that are lesser than 90 degrees. Nevertheless, if we had to operate within the unit circle on the x-y coordinate plane, then we would be capable to apply trigonometry to all angles between 0 degree and 360 degree

The unit circle has its centre at the origin of the x-y coordinate plane, and a radius of a singular unit. Rotations surrounding the unit circle are carried out in a counter clock wise fashion, beginning from the positive x-axis. The cosine of the rotated angle would then be provided by the x-coordinate of the point that hits the unit circle, while the y-coordinate mentions the sine of the rotated angle. It is also worth considering that the quadrants are symmetrical, and therefore a point in a single quadrant has symmetrical counterparts in the other three.

The graphed sine, cosine and tangent functions appear as follows:

All functions are periodic, with the sine and the cosine functions featuring the same shape albeit being displaced by ninety degrees amongst each other. The sine and the cosine functions might, indeed, be easily sketched from the calculated x- and y- coordinates as one rotates around the unit circle. The tangent function may also be sketched likewise, as for any angle ? this function can be defined by:

tan 𝜃 = sin 𝜃 / cos 𝜃 = *y* / *x*

The tangent function is not defined at ±90^{o, }as the cosine in the denominator returns a value of zero at this angle. Therefore, we draw vertical asymptotes at these angles, which are imaginary lines that the curve approaches but never touches.

One last note is concerned with the inverse of these trigonometric functions. Taking up the sine function as an instance, its inverse is signified by sin to the power of -1. This is not to be confused for the cosecant function, which is instead the reciprocal of sine, and therefore not the same as its inverse.

__Conclusion__

In this brief guide, you got to know about the various pre-requisites for working with calculus.

Particularly, you got to learn about:

- Linear and non-linear functions are central to calculus and machine learning, and several calculus problems consist of their usage
- Fundamental ideas from algebra and trigonometry furnish the foundations for calculus, and will become particularly critical as we manage more sophisticated calculus topics.