SGD with Momentum.
SGD with Nesterov Momentum.
Adagrad and RMSProp.
Adam.
Nesterov with Momentum.
Lookahead.
Weight Decay.
[SAM] P. Foret, A. Kleiner, H. Mobahi, and B. Neyshabur, "Sharpness-Aware Minimization for Efficiently Improving Generalization", arXiv preprint arXiv:2010.01412, 2020. [Video] [Code]
[ASAM] J. Kwon, J. Kim, H. Park, and I. K. Choi, "ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks", International Conference on Machine Learning, 2021.
[ESAM]
[LookSAM]
[SAF]
[LARS] Layerwise learning rate adaptation (LARS)
References:
https://viblo.asia/p/sam-giai-thuat-toi-uu-dang-dan-duoc-ung-dung-rong-rai-aNj4vXNxL6r