Stochastic gradient descent

"\\begin{algorithm}\n\\caption{Stochastic Gradient Descent (SGD) update}\n\\begin{algorithmic}\n\\Require Learning rate schedule $\\{\\epsilon_1, \\epsilon_2, \\dots\\}$\n\\Require Initial parameter $\\theta$\n\\State $k \\gets 1$\n\\While{stopping criterion not met}\n \\State Sample a minibatch of $m$ examples from the training set $\\{x^{(1)}, \\dots, x^{(m)}\\}$ with corresponding targets $y^{(i)}$.\n \\State Compute gradient estimate: $\\hat{g} \\gets \\frac{1}{m} \\nabla_{\\theta} \\sum_{i} L\\bigl(f(x^{(i)};\\theta), y^{(i)}\\bigr)$\n \\State Apply update: $\\theta \\gets \\theta - \\epsilon_k \\hat{g}$\n \\State $k \\gets k + 1$\n\\EndWhile\n\\end{algorithmic}\n\\end{algorithm}"

Algorithm 1 Stochastic Gradient Descent (SGD) update

Require: Learning rate schedule $\{\epsilon_1, \epsilon_2, \dots\}$

Require: Initial parameter $\theta$

1: $k \gets 1$

2:while stopping criterion not met do

3:Sample a minibatch of $m$ examples from the training set $\{x^{(1)}, \dots, x^{(m)}\}$ with corresponding targets $y^{(i)}$ .

4:Compute gradient estimate: $\hat{g} \gets \frac{1}{m} \nabla_{\theta} \sum_{i} L\bigl(f(x^{(i)};\theta), y^{(i)}\bigr)$

5:Apply update: $\theta \gets \theta - \epsilon_k \hat{g}$

6: $k \gets k + 1$

7:end while

Étiquette