PyTorch

MultiMarginLoss

Creates a criterion that optimizes a multi-class classification hinge loss (margin-based loss) between input $x$ (a 2D mini-batch Tensor) and output $y$ (which is a 1D tensor of target class indices, $0 \le y \le \text{x}.\text{size}(1) -1$ ):

For each mini-batch sample, loss in terms of 1D input $x$ and output $y$ is:

\text{loss}(x,y) = \frac{\sum_{i} \max{0, \text{margin} - x[y] + x[i]}^p}{x.\text{size}(0)} \\ \because i \in \{0, \ldots x.\text{size}(0)-1\} \text{ and } i \neq y

SGD

Nesterov momentum is based on On the importance of initialization and momentum in deep learning

"\\begin{algorithm}\n\\caption{SGD in PyTorch}\n\\begin{algorithmic}\n\\State \\textbf{input:} $\\gamma$ (lr), $\\theta_0$ (params), $f(\\theta)$ (objective), $\\lambda$ (weight decay),\n\\State $\\mu$ (momentum), $\\tau$ (dampening), nesterov, maximize\n\\For{$t = 1$ to $...$}\n \\State $g_t \\gets \\nabla_\\theta f_t(\\theta_{t-1})$\n \\If{$\\lambda \\neq 0$}\n \\State $g_t \\gets g_t + \\lambda\\theta_{t-1}$\n \\EndIf\n \\If{$\\mu \\neq 0$}\n \\If{$t > 1$}\n \\State $b_t \\gets \\mu b_{t-1} + (1-\\tau)g_t$\n \\Else\n \\State $b_t \\gets g_t$\n \\EndIf\n \\If{$\\text{nesterov}$}\n \\State $g_t \\gets g_t + \\mu b_t$\n \\Else\n \\State $g_t \\gets b_t$\n \\EndIf\n \\EndIf\n \\If{$\\text{maximize}$}\n \\State $\\theta_t \\gets \\theta_{t-1} + \\gamma g_t$\n \\Else\n \\State $\\theta_t \\gets \\theta_{t-1} - \\gamma g_t$\n \\EndIf\n\\EndFor\n\\State \\textbf{return} $\\theta_t$\n\\end{algorithmic}\n\\end{algorithm}"

Algorithm 1 SGD in PyTorch

1:input: $\gamma$ (lr), $\theta_0$ (params), $f(\theta)$ (objective), $\lambda$ (weight decay),

2: $\mu$ (momentum), $\tau$ (dampening), nesterov, maximize

3:for $t = 1$ to $...$ do

4: $g_t \gets \nabla_\theta f_t(\theta_{t-1})$

5:if $\lambda \neq 0$ then

6: $g_t \gets g_t + \lambda\theta_{t-1}$

7:end if

8:if $\mu \neq 0$ then

9:if $t > 1$ then

10: $b_t \gets \mu b_{t-1} + (1-\tau)g_t$

11:else

12: $b_t \gets g_t$

13:end if

14:if $\text{nesterov}$ then

15: $g_t \gets g_t + \mu b_t$

16:else

17: $g_t \gets b_t$

18:end if

19:end if

20:if $\text{maximize}$ then

21: $\theta_t \gets \theta_{t-1} + \gamma g_t$

22:else

23: $\theta_t \gets \theta_{t-1} - \gamma g_t$

24:end if

25:end for

26:return $\theta_t$

PyTorch

Étiquette

publié à

modifié à

durée

source

MultiMarginLoss

SGD

Vous pourriez aimer ce qui suit

Liens retour