profile pic
⌘ '
raccourcis clavier

See also SGD and ODEs

Nesterov momentum is based on On the importance of initialization and momentum in deep learning

"\\begin{algorithm}\n\\caption{SGD}\n\\begin{algorithmic}\n\\State \\textbf{input:} $\\gamma$ (lr), $\\theta_0$ (params), $f(\\theta)$ (objective), $\\lambda$ (weight decay),\n\\State $\\mu$ (momentum), $\\tau$ (dampening), nesterov, maximize\n\\For{$t = 1$ to $...$}\n \\State $g_t \\gets \\nabla_\\theta f_t(\\theta_{t-1})$\n \\If{$\\lambda \\neq 0$}\n \\State $g_t \\gets g_t + \\lambda\\theta_{t-1}$\n \\EndIf\n \\If{$\\mu \\neq 0$}\n \\If{$t > 1$}\n \\State $b_t \\gets \\mu b_{t-1} + (1-\\tau)g_t$\n \\Else\n \\State $b_t \\gets g_t$\n \\EndIf\n \\If{$\\text{nesterov}$}\n \\State $g_t \\gets g_t + \\mu b_t$\n \\Else\n \\State $g_t \\gets b_t$\n \\EndIf\n \\EndIf\n \\If{$\\text{maximize}$}\n \\State $\\theta_t \\gets \\theta_{t-1} + \\gamma g_t$\n \\Else\n \\State $\\theta_t \\gets \\theta_{t-1} - \\gamma g_t$\n \\EndIf\n\\EndFor\n\\State \\textbf{return} $\\theta_t$\n\\end{algorithmic}\n\\end{algorithm}"

Algorithm 5 SGD

1:input: γ\gamma (lr), θ0\theta_0 (params), f(θ)f(\theta) (objective), λ\lambda (weight decay),

2:μ\mu (momentum), τ\tau (dampening), nesterov, maximize

3:for t=1t = 1 to ...... do

4:gtθft(θt1)g_t \gets \nabla_\theta f_t(\theta_{t-1})

5:if λ0\lambda \neq 0 then

6:gtgt+λθt1g_t \gets g_t + \lambda\theta_{t-1}

7:end if

8:if μ0\mu \neq 0 then

9:if t>1t > 1 then

10:btμbt1+(1τ)gtb_t \gets \mu b_{t-1} + (1-\tau)g_t

11:else

12:btgtb_t \gets g_t

13:end if

14:if nesterov\text{nesterov} then

15:gtgt+μbtg_t \gets g_t + \mu b_t

16:else

17:gtbtg_t \gets b_t

18:end if

19:end if

20:if maximize\text{maximize} then

21:θtθt1+γgt\theta_t \gets \theta_{t-1} + \gamma g_t

22:else

23:θtθt1γgt\theta_t \gets \theta_{t-1} - \gamma g_t

24:end if

25:end for

26:return θt\theta_t