8. Concise Implementation of Linear Regression#
Deep learning has witnessed a Cambrian explosion of sorts over the past decade. The sheer number of techniques, applications and algorithms by far surpasses the progress of previous decades. This is due to a fortuitous combination of multiple factors, one of which is the powerful free tools offered by a number of open source deep learning frameworks. Theano (Bergstra et al., 2010), DistBelief (Dean et al., 2012), and Caffe (Jia et al., 2014) arguably represent the first generation of such models that found widespread adoption. In contrast to earlier (seminal) works like SN2 (Simulateur Neuristique) (Bottou and Le Cun, 1988), which provided a Lisp-like programming experience, modern frameworks offer automatic differentiation and the convenience of Julia. These frameworks allow us to automate and modularize the repetitive work of implementing gradient-based learning algorithms.
In practice, because data iterators, loss functions, optimizers, and neural network layers are so common, modern libraries implement these components for us as well. In this section, we will show you how to implement the linear regression model from Section 3.4 concisely by using high-level APIs of deep learning frameworks.
8.1. Generating the Dataset#
For this example, we will work low-dimensional for succinctness. The following code snippet generates 100 examples with 2-dimensional features drawn from a standard normal distribution.
using Distributions
function synthetic_data(w::Vector{<:Real},b::Real,num_example::Int)
X = randn(Float32,(num_example,length(w)))
y = Float32.(X * w .+ b)
y += rand(Normal(0f0,0.01f0),(size(y)))
return X',reshape(y,(1,:))
end
synthetic_data (generic function with 1 method)
Later, we can check our estimated parameters against these ground truth values.
true_w = [2,-3.4]
true_b = 4.2
features,labels = synthetic_data(true_w,true_b,100)
(Float32[0.5049412 0.24637741 … 1.7447525 0.9225617; -0.9852763 -1.5925564 … 0.7616638 1.9118508], Float32[8.557541 10.108755 … 5.107828 -0.4439846])
Let’s have a look at the first entry.
println("features:$(features[:,1])")
println("label:$(labels[1])")
features:Float32[0.5049412, -0.9852763]
label:8.557541
8.2. Reading the Dataset#
To build some intuition, let’s inspect the first minibatch of data. Each minibatch of features provides us with both its size and the dimensionality of input features. Likewise, our minibatch of labels will have a matching shape given by batch_size
.
using MLUtils
train_loader = DataLoader((features,labels),batchsize=10,shuffle=true)
X,y = first(train_loader)
println("X shape:$(size(X))")
println("y shape:$(size(y))")
X shape:(2, 10)
y shape:(1, 10)
8.3. Defining the Model#
For standard operations, we can use a framework’s predefined layers, which allow us to focus on the layers used to construct the model rather than worrying about their implementation. Recall the architecture of a single-layer network as described in Fig. 3.1.2. The layer is called fully connected, since each of its inputs is connected to each of its outputs by means of a matrix-vector multiplication.
In Lux, a Dense(2 => 1)
layer denotes a layer of one neuron with two inputs (two feature) and one output.
using Lux,Random
rng = Xoshiro(0)
model = Dense(2=>1)
ps, st = Lux.setup(rng, model)
((weight = Float32[-0.034858026 -0.23098828], bias = Float32[-0.68244076]), NamedTuple())
8.4. Defining the Loss Function#
The MSELoss()
function computes the mean squared error. By default, MSELoss()
returns the average loss over examples. It is faster (and easier to use) than implementing our own.
const mse = MSELoss()
(::GenericLossFunction{typeof(Lux.LossFunctionImpl.l2_distance_loss), typeof(mean)}) (generic function with 2 methods)
8.5. Defining the Optimization Algorithm#
Descent
is a classic gradient descent optimiser.
using Optimisers
opt = Descent()
Descent(0.1f0)
8.6. Training#
You might have noticed that expressing our model through high-level APIs of a deep learning framework requires fewer lines of code. We did not have to allocate parameters individually, define our loss function, or implement gradient descent. Once we start working with much more complex models, the advantages of the high-level API will grow considerably.
using Printf,Zygote
train_state = Training.TrainState(model, ps, st,opt)
num_epochs = 3
for epoch in 1:num_epochs
for data in train_loader
(_, loss, _, train_state) = Training.single_train_step!(AutoZygote(), mse, data, train_state)
end
@printf "epoch %i, loss %f \n" epoch mse(model(features,ps,st)[1],labels)
end
epoch 1, loss 0.408317
epoch 2, loss 0.006815
epoch 3, loss 0.000224
Below, we compare the model parameters learned by training on finite data and the actual parameters that generated our dataset. To access parameters, we access the weights and bias of the layer that we need. Note that our estimated parameters are close to their true counterparts.
weight,bias = vec(ps[1]),first(ps[2])
println("error in estimating w:$(true_w - weight)")
println("error in estimating b:$(true_b - bias)")
error in estimating w:[-0.004839897155761719, -0.0038477420806883877]
error in estimating b:0.010038566589355646