-
OptimizationBoostCourse 2023. 1. 11. 03:28
From BoostCourse 최성준 (고려대학교 인공지능학과)
1. Gradient Descent Methods
A. Stochastic gradient descent vs Mini-batch gradient descent vs Batch gradient descent
- large-batch methods tend to converge to sharp minimizers
- small-batch methods consistently converge to flat minimizers
>> small-batch methods are better than large-batch methods
B. Momentum
- momentum accumulates the gradient of the past steps to determing the direction to go.
C. Nesterov Accelerated Gradient
- similar to momentum, but move and calculate.
D. Adagrad
- adapts the learning rate, performing larger updates for infrequent and smaller updates for frequent parameters.
E. Adadelta
- extends Adagrad to reduce its monotonically decreasing the learning rate by restricting the accumulation window.
F. RMSprop
- extends Adagrad by considering stepsize
G. Adam
- Adaptive Moment Estimation leverages both past gradients and squared gradients (Momentum + RMSprop)
2. Regularization
A. Early Stopping
B. Parameter Norm Penalty
C. Data Augmentation
D. Label Smoothing
- mixup, cutout, cutmix
E. Dropout
- in each forward pass, randomly set some neurons to zero
F. Batch Normalization
'BoostCourse' 카테고리의 다른 글
베이즈 통계학 맛보기 (0) 2023.01.11 선형결합 (0) 2023.01.07 선형방정식과 선형시스템 (0) 2023.01.07 행렬은 뭔가요? (0) 2023.01.07 벡터가 뭐에요? (0) 2023.01.05