ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • RNN, LSTM, and GRU
    NAVER AI TECH 2023. 3. 29. 13:12

    RNN

    Equation

    $h_t = f_w(h_{t-1}, x_t)$

    $h_t = tanh(W_{hh}h_{t-1} + W_{xh}x_t)$

    $y_t = W_{hy}h_t$

    if Binary classfication $\rightarrow$ sigmoid($y_t$)

    if Multiclass classification $\rightarrow$ softmax($y_t$)

     

    Structure

    http://karpathy.github.io/2015/05/21/rnn-effectiveness/

     

    Weakness

    vanishing/exploding gradient

    http://colah.github.io/posts/2015-08-Understanding-LSTMs/

     

    Practice

    (3강-실습) Basic RNN 실습_조민우_T5200

     

     

    LSTM

    Equation

    • i: Input gate, whether to write to cell

    • f: Forget gate, whether to erase cell

    • o: Output gate, how much to reveal cell

    • g: Gate gate, how much to write to cell

     

    $\left(\begin{array}{l}i \\ f \\ o \\ g\end{array}\right)=\left(\begin{array}{c}\sigma \\ \sigma \\ \sigma \\ \tanh \end{array}\right) W\left(\begin{array}{c}h_{t-1} \\ x_t\end{array}\right)$

     

    Forget gate:

    $f_t=\sigma\left(W_f \cdot\left[h_{t-1}, x_t\right]+b_f\right)$

     

    Input gate & Gate gate:

    $i_t=\sigma\left(W_i \cdot\left[h_{t-1}, x_t\right]+b_i\right)$

    $\tilde{C_t}=\tanh \left(W_C \cdot\left[h_{t-1}, x_t\right]+b_C\right)$

    $C_t=f_t \cdot C_{t-1}+i_t \cdot \tilde{C}_t$

     

    Output gate:

    $o_t=\sigma\left(W_o\left[h_{t-1}, x_t\right]+b_o\right)$

    $h_t=o_t \cdot \tanh \left(C_t\right)$

     

    Structure

    key: cell state는 장기적인 정보를, hidden state는 현재 출력을 위한 정보를 보유

    http://colah.github.io/posts/2015-08-Understanding-LSTMs/

     

    GRU

    Equation

    key: hidden state = (cell state + hidden state) in LSTM

     

    similar to input gate

    $z_t =\sigma\left(W_z \cdot\left[h_{t-1}, x_t\right]\right)$

     

    some other changes

    $r_t =\sigma\left(W_r \cdot\left[h_{t-1}, x_t\right]\right)$

    $\tilde{h_t} =\tanh \left(W \cdot\left[r_t \cdot h_{t-1}, x_t\right]\right)$

     

    update gate = forget gate + input gate

    $\left(1-z_t\right) \cdot h_{t-1}$ $\rightarrow$ similar to forget gate

    $h_t =\left(1-z_t\right) \cdot h_{t-1}+z_t \cdot \tilde{h_t}$

     

    Structure

    http://colah.github.io/posts/2015-08-Understanding-LSTMs/

     

    Strength(LSTM & GRU)

    BPTT에서 합 연산(+ operation)이 있기 때문에 vanishing/exploding gradient problem에서 비교적 자유로움

     

    Practice

    (4강-실습) LSTM, GRU 실습_조민우_T5200

    ** LSTM 파라미터 개수 구하는 법 첨부 **

     

     

     

    'NAVER AI TECH' 카테고리의 다른 글

    Transformer  (0) 2023.04.05
    seq2seq  (0) 2023.04.03
    Attention Is All You Need  (0) 2023.03.28
    실습 (3주 1일차)  (0) 2023.03.27
    Word Embedding (4주차)  (0) 2023.03.27

    댓글

Designed by Tistory.