idea:
- first take a step in the direction of accumulated momentum
- computes gradient at “lookahead” position,
- make the update using this gradient.
definition
For a parameter vector , the update can be expressed as
Achieves better convergence rates
function type | gradient descent | Nesterove AG |
---|---|---|
Smooth | ||
Smooth & Strongly Convex |
optimal assignments for parameters