ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • Optimization
    BoostCourse 2023. 1. 11. 03:28

    From BoostCourse 최성준 (고려대학교 인공지능학과)


    1. Gradient Descent Methods

    A. Stochastic gradient descent vs Mini-batch gradient descent vs Batch gradient descent

    - large-batch methods tend to converge to sharp minimizers

    - small-batch methods consistently converge to flat minimizers

    >> small-batch methods are better than large-batch methods

    B. Momentum

    - momentum accumulates the gradient of the past steps to determing the direction to go.

    C. Nesterov Accelerated Gradient

    - similar to momentum, but move and calculate.

    D. Adagrad

    - adapts the learning rate, performing larger updates for infrequent and smaller updates for frequent parameters.

    E. Adadelta

    - extends Adagrad to reduce its monotonically decreasing the learning rate by restricting the accumulation window.

    F. RMSprop

    - extends Adagrad by considering stepsize

    G. Adam

    - Adaptive Moment Estimation leverages both past gradients and squared gradients (Momentum + RMSprop)

     

    2. Regularization

    A. Early Stopping

    B. Parameter Norm Penalty

    C. Data Augmentation

    D. Label Smoothing

    - mixup, cutout, cutmix

    E. Dropout

    - in each forward pass, randomly set some neurons to zero

    F. Batch Normalization

     

     

    'BoostCourse' 카테고리의 다른 글

    베이즈 통계학 맛보기  (0) 2023.01.11
    선형결합  (0) 2023.01.07
    선형방정식과 선형시스템  (0) 2023.01.07
    행렬은 뭔가요?  (0) 2023.01.07
    벡터가 뭐에요?  (0) 2023.01.05

    댓글

Designed by Tistory.