profile pic
⌘ '
raccourcis clavier

See also that numerical assignment on ODEs and GD, SGD implementation in PyTorch

"\\begin{algorithm}\n\\caption{Stochastic Gradient Descent (SGD) update}\n\\begin{algorithmic}\n\\Require Learning rate schedule $\\{\\epsilon_1, \\epsilon_2, \\dots\\}$\n\\Require Initial parameter $\\theta$\n\\State $k \\gets 1$\n\\While{stopping criterion not met}\n \\State Sample a minibatch of $m$ examples from the training set $\\{x^{(1)}, \\dots, x^{(m)}\\}$ with corresponding targets $y^{(i)}$.\n \\State Compute gradient estimate: $\\hat{g} \\gets \\frac{1}{m} \\nabla_{\\theta} \\sum_{i} L\\bigl(f(x^{(i)};\\theta), y^{(i)}\\bigr)$\n \\State Apply update: $\\theta \\gets \\theta - \\epsilon_k \\hat{g}$\n \\State $k \\gets k + 1$\n\\EndWhile\n\\end{algorithmic}\n\\end{algorithm}"

Algorithm 1 Stochastic Gradient Descent (SGD) update

Require: Learning rate schedule {ϵ1,ϵ2,}\{\epsilon_1, \epsilon_2, \dots\}

Require: Initial parameter θ\theta

1:k1k \gets 1

2:while stopping criterion not met do

3:Sample a minibatch of mm examples from the training set {x(1),,x(m)}\{x^{(1)}, \dots, x^{(m)}\} with corresponding targets y(i)y^{(i)}.

4:Compute gradient estimate: g^1mθiL(f(x(i);θ),y(i))\hat{g} \gets \frac{1}{m} \nabla_{\theta} \sum_{i} L\bigl(f(x^{(i)};\theta), y^{(i)}\bigr)

5:Apply update: θθϵkg^\theta \gets \theta - \epsilon_k \hat{g}

6:kk+1k \gets k + 1

7:end while