Working with Sequences

18. Working with Sequences#

18.1. Training#

Before we focus our attention on text data, let’s first try this out with some continuous-valued synthetic data.

Here, our 1000 synthetic data will follow the trigonometric sin function, applied to 0.01 times the time step. To make the problem a little more interesting, we corrupt each sample with additive noise. From this sequence we extract training examples, each consisting of features and a label.

using CairoMakie

batch_size=16
T=1000
num_train=600
tau=4

time = 1:T
x = sin.(0.01time)+0.2randn(T)

fg,ax = lines(time, x; axis=(;yticks = -1:0.5:1.5,xticks = 0:200:1000,xlabel = "time",ylabel = "x"))

We create a data iterator on the first 600 examples, covering a period of the sin function.

using IterTools,Flux

data = hcat(map(x->collect(x),[partition(x,tau,1)...][begin:end-1])...)

features = data[:,1:num_train]
labels = x[tau+1:num_train+tau]


vals = data[:,num_train+1:end]
vals_labels = x[num_train+tau+1:end]

train_loader = Flux.DataLoader((features,labels),batchsize = batch_size)
38-element DataLoader(::Tuple{Matrix{Float64}, Vector{Float64}}, batchsize=16)
  with first element:
  (4×16 Matrix{Float64}, 16-element Vector{Float64},)

In this example our model will be a standard linear regression.

using MLUtils

model = Dense(4=>1)

loss(model,x,y) = Flux.mse(vec(model(x)),y)

train_loss = []
val_loss = []

for i in 1:5
    for data in train_loader
        Flux.train!(loss,model,[data],Descent(0.01))
    end
    push!(val_loss,loss(model,vals,vals_labels))
    push!(train_loss,loss(model,features,labels))
end

fg2,ax2 = lines(train_loss,label="train_loss"; axis=(;yticks = 0.05:0.05:0.3,xticks = 0:5,xlabel = "time",ylabel = ""))
lines!(ax2,val_loss,label="val_loss",linestyle=:dash)
axislegend(position = :rt)
fg2

18.2. WIP#