Understanding Backpropagation and Activation Functions in Machine Learning

Опубликовано: 25 Май 2024
на канале: Stephen Blum

We're discussing backpropagation in machine learning. Another key aspect is the activation function, which is basically a normalization process. When you perform matrix multiplication on all the different floating point numbers in the matrix, your output layer could start expanding rapidly.

We want to avoid this as it can cause hot spots or spikes where things can get trapped. We need smooth gradients, and activation functions that normalize are a way to achieve this. Common activation functions like tanh and sigmoid are used, but a popular one is RelU due to its efficiency.

In forward propagation, an activation occurs after the multiplication of the matrix with the inputs. It's fairly simple. An activator like tanh can be applied and what it does is stop the numbers from blowing up to massive scales.

It confines them into a space between, 1 and +1, making forward propagation over a large number of layers easier. The same concept is employed in back propagation, but in reverse, with the derivative. To reverse the actions of the activation function, we apply the derivative to the input and the weights.

This is all done in a straightforward one-liner. Back propagation also involves going backwards, to identify the differences or 'deltas' in the model. We compare the model's output with the target output and check how far off each layer in the model was.

Then, we gradually optimize based on this learning rate. This process helps get our outputs closer to our desired targets.