-
RNN, LSTM, and GRUNAVER AI TECH 2023. 3. 29. 13:12
RNN
Equation
$h_t = f_w(h_{t-1}, x_t)$
$h_t = tanh(W_{hh}h_{t-1} + W_{xh}x_t)$
$y_t = W_{hy}h_t$
if Binary classfication $\rightarrow$ sigmoid($y_t$)
if Multiclass classification $\rightarrow$ softmax($y_t$)
Structure
http://karpathy.github.io/2015/05/21/rnn-effectiveness/ Weakness
vanishing/exploding gradient
http://colah.github.io/posts/2015-08-Understanding-LSTMs/ Practice
(3강-실습) Basic RNN 실습_조민우_T5200
LSTM
Equation
• i: Input gate, whether to write to cell
• f: Forget gate, whether to erase cell
• o: Output gate, how much to reveal cell
• g: Gate gate, how much to write to cell
$\left(\begin{array}{l}i \\ f \\ o \\ g\end{array}\right)=\left(\begin{array}{c}\sigma \\ \sigma \\ \sigma \\ \tanh \end{array}\right) W\left(\begin{array}{c}h_{t-1} \\ x_t\end{array}\right)$
Forget gate:
$f_t=\sigma\left(W_f \cdot\left[h_{t-1}, x_t\right]+b_f\right)$
Input gate & Gate gate:
$i_t=\sigma\left(W_i \cdot\left[h_{t-1}, x_t\right]+b_i\right)$
$\tilde{C_t}=\tanh \left(W_C \cdot\left[h_{t-1}, x_t\right]+b_C\right)$
$C_t=f_t \cdot C_{t-1}+i_t \cdot \tilde{C}_t$
Output gate:
$o_t=\sigma\left(W_o\left[h_{t-1}, x_t\right]+b_o\right)$
$h_t=o_t \cdot \tanh \left(C_t\right)$
Structure
key: cell state는 장기적인 정보를, hidden state는 현재 출력을 위한 정보를 보유
http://colah.github.io/posts/2015-08-Understanding-LSTMs/ GRU
Equation
key: hidden state = (cell state + hidden state) in LSTM
similar to input gate
$z_t =\sigma\left(W_z \cdot\left[h_{t-1}, x_t\right]\right)$
some other changes
$r_t =\sigma\left(W_r \cdot\left[h_{t-1}, x_t\right]\right)$
$\tilde{h_t} =\tanh \left(W \cdot\left[r_t \cdot h_{t-1}, x_t\right]\right)$
update gate = forget gate + input gate
$\left(1-z_t\right) \cdot h_{t-1}$ $\rightarrow$ similar to forget gate
$h_t =\left(1-z_t\right) \cdot h_{t-1}+z_t \cdot \tilde{h_t}$
Structure
http://colah.github.io/posts/2015-08-Understanding-LSTMs/ Strength(LSTM & GRU)
BPTT에서 합 연산(+ operation)이 있기 때문에 vanishing/exploding gradient problem에서 비교적 자유로움
Practice
(4강-실습) LSTM, GRU 실습_조민우_T5200
'NAVER AI TECH' 카테고리의 다른 글
Transformer (0) 2023.04.05 seq2seq (0) 2023.04.03 Attention Is All You Need (0) 2023.03.28 실습 (3주 1일차) (0) 2023.03.27 Word Embedding (4주차) (0) 2023.03.27