Stochastic gradient descent
See also that numerical assignment on ODEs and GD, SGD implementation in PyTorch
Algorithm 1 Stochastic Gradient Descent (SGD) update
Require: Learning rate schedule {ϵ1,ϵ2,…}
Require: Initial parameter θ
1:k←1
2:while stopping criterion not met do
3:Sample a minibatch of m examples from the training set {x(1),…,x(m)} with corresponding targets y(i).
4:Compute gradient estimate: g^←m1∇θ∑iL(f(x(i);θ),y(i))
5:Apply update: θ←θ−ϵkg^
6:k←k+1
7:end while