Neural Net - Getting down to the Basics


Let's start with something very basic - functions. Even though they are very fundamental mathematical entities, sometimes it really helps breaking things down to the smallest units to really understand the bigger concepts. 

Now a function is like a machine. You feed it some inputs and it produces some output(s). What happens inside the machine depends on how the function is defined. A very simple example is:
y = f(x) = x^2
So an input of 2 will yield 4 when processed through this function.

How would this function look like if we were to graph it? Take a look below:









What you see is not a straight line, which makes this a non-linear function, that is, a function whose slope differs between two points.

Now let's talk about composite functions. These are effectively taking the output of one function and passing that as input to another function. That would look something like this:
Function 1: f(x)=x^2
Function 2: g(x)=2x
Composite Function: f(g(x)) = (2x) ^2
Next we need to understand derivatives. Simply put, it is the rate of change of a function at a given point of time (slope of the curve), or in other words, it is the measure of change in one quantity as the other quantity changes.















Source: https://www.wyzant.com/resources/lessons/math/calculus/derivative_proofs/e_to_the_x

Why do we bother calculating derivatives? Because they give us the best estimate of a linear function in the neighbourhood of a point in a non-linear function, and linear algebra is slightly easier to deal with when it comes to complex functions :)

So we know composite functions and we know derivates. How do we calculate the derivative of a composite function? We follow what's called the Chain rule for that:
F'(x) = f'(g(x))g'(x)
This is about as much as you need to know at this point, but feel free to explore on this topic.

Now that we have that covered, let's talk about partial derivatives. So far we have been looking at examples of functions with only 1 variable. Let's look at another example now:
z = f(x,y) = x^2 + y^2
This would be a multivariable function, and here's how the graph would look like:















Clearly it would be tricky calculating the slope for a single point on this graph now because there would be an infinite number of tangents running though it. How we deal with that is by treating one of the two variables, say y, as a constant (essentially hiding a dimension) and then calculate the slope of the point, which would give us the partial derivate of z with respect to x at the given point. We do this again, this time keeping x constant and calculate the slope of the point, which will give us the partial derivate of z with respect to y at the given point.

Now let's start piecing it together.

Training a basic neural network takes two main steps:
  1. Feed Forward
  2. Backpropagation 
We'll break these down one at a time. 



Source: https://www.researchgate.net/figure/234055177_fig1_Figure-61-Sample-of-a-feed-forward-neural-network


Now, neural network is nothing but a massive composite function, where each layer in the feed-forward mechanism is a single function whose input values are the outputs of the previous layer multiplied by a weight vector. These weight values, aka biases are initiated with random values and over the course of the training they start determining the relation between the input and output data. This is the feed forward propagation, conceptually. We'll dive into the details of these layers in the upcoming posts.












Source: https://www.researchgate.net/figure/241741756_fig2_Figure-2-Back-propagation-multilayer-ANN-with-one-hidden-layer


After the forward propagation comes the back-propagation. Its purpose is to determine the partial derivatives of the error function with respect to each weight in the network, so those can be further utilised in gradient descent. 

Okay lots of new words there, but at the core backpropagation is just applying the chain rule over and over again through all possible paths in the network. The goal of any NN training is to find the gradient of each weight with respect to the output - so we can update the weight incrementally using gradient descent. 

This first post was just to cover the basics of the mathematical concepts that will come in handy as the topic gets more complex. I'll cover gradient descent and the calculations behind forward and backward propagations in the next post.

Hope this was helpful.

Comments

Popular Posts