11. Multilayer Perceptrons#

In Section 4, we introduced softmax regression (Section 4.1), implementing the algorithm from scratch (Section 4.4) and using high-level APIs (Section 4.5). This allowed us to train classifiers capable of recognizing 10 categories of clothing from low-resolution images. Along the way, we learned how to wrangle data, coerce our outputs into a valid probability distribution, apply an appropriate loss function, and minimize it with respect to our model’s parameters. Now that we have mastered these mechanics in the context of simple linear models, we can launch our exploration of deep neural networks, the comparatively rich class of models with which this book is primarily concerned.

11.1. Activation Functions#

Activation functions decide whether a neuron should be activated or not by calculating the weighted sum and further adding bias with it. They are differentiable operators to transform input signals to outputs, while most of them add non-linearity. Because activation functions are fundamental to deep learning, let’s briefly survey some common activation functions.

11.1.1. ReLU Function#

Informally, the ReLU function retains only positive elements and discards all negative elements by setting the corresponding activations to 0. To gain some intuition, we can plot the function. As you can see, the activation function is piecewise linear.

using CairoMakie
using Flux

x = -8.0:0.1:8.0
lines(x,relu;axis = (;xlabel = "x",ylabel="relu(x)"))
../../_images/3a2dbd2fbc4dffe9f8673a48307cf95eb0de781558bf7d92e9cc75c7e81e450a.png

When the input is negative, the derivative of the ReLU function is 0, and when the input is positive, the derivative of the ReLU function is 1. Note that the ReLU function is not differentiable when the input takes value precisely equal to 0. In these cases, we default to the left-hand-side derivative and say that the derivative is 0 when the input is 0. We can get away with this because the input may never actually be zero (mathematicians would say that it is nondifferentiable on a set of measure zero). There is an old adage that if subtle boundary conditions matter, we are probably doing (real) mathematics, not engineering. That conventional wisdom may apply here, or at least, the fact that we are not performing constrained optimization (Mangasarian, 1965, Rockafellar, 1970). We plot the derivative of the ReLU function plotted below.

using Zygote

lines(x,relu';axis = (;xlabel = "x",ylabel="grad of relu"))
../../_images/8bdb2ae88225c1dada08d47c27f1658d8e7451dda3b1c526326a8336e64b7e1e.png

11.2. Sigmoid Function#

Below, we plot the sigmoid function. Note that when the input is close to 0, the sigmoid function approaches a linear transformation.

lines(x,sigmoid;axis = (;xlabel = "x",ylabel="sigmoid(x)"))
../../_images/406474f9aa9110e1c7518623c00ae096f1fb9f2317a3ec92cb7dc95715f4ba59.png

The derivative of the sigmoid function is plotted below. Note that when the input is 0, the derivative of the sigmoid function reaches a maximum of 0.25. As the input diverges from 0 in either direction, the derivative approaches 0.

lines(x,sigmoid';axis = (;xlabel = "x",ylabel="grad of sigmoid"))
../../_images/5b8da626dacc61c94420c7b1268e632c45d585c7e6945d24c1a824822647a30c.png

11.3. Tanh Function#

We plot the tanh function below. Note that as input nears 0, the tanh function approaches a linear transformation. Although the shape of the function is similar to that of the sigmoid function, the tanh function exhibits point symmetry about the origin of the coordinate system (Kalman and Kwasny, 1992).

lines(x,tanh;axis = (;xlabel = "x",ylabel="tanh(x)"))
../../_images/709a69dec6d1ffe0ee4c29e55b25b06cf807996a4687499bb1742f4c9e1be018.png

It is plotted below. As the input nears 0, the derivative of the tanh function approaches a maximum of 1. And as we saw with the sigmoid function, as input moves away from 0 in either direction, the derivative of the tanh function approaches 0.

lines(x,tanh';axis = (;xlabel = "x",ylabel="tanh(x)"))
../../_images/35322daab6f79f0f13c2c876343bec879e73e1e593367ecd93f210ac5f1a3d4d.png