RNN, LSTM 에서 tanh 를 사용하는 이유

Machine Learning/Theory

RNN, LSTM 에서 tanh 를 사용하는 이유

족제비다아 2021. 5. 25. 10:51

https://stats.stackexchange.com/questions/444923/activation-function-between-lstm-layers

Activation function between LSTM layers

I'm aware the LSTM cell uses both sigmoid and tanh activation functions internally, however when creating a stacked LSTM architecture does it make sense to pass their outputs through an activation

stats.stackexchange.com

RNN을 공부하면서 Activation Function으로 sigmoid보다 tanh를 사용하는 이유는 이해가 되었다.

sigmoid에 비해 tanh 는 기울기가 0에서 1 사이이므로 Gradient Vanishing problem에 더 강하기 때문인데 그러면 'ReLU는 왜 안쓰지?' 라는 생각이 들었다.

결론

RNN은 CNN과 달리 이전 step의 값을 가져와서 사용하므로 ReLU를 쓰게되면 이전 값이 커짐에 따라 전체적인 출력이 발산하는 문제가 생길 수 있다. 따라서 과거의 값들을 재귀적으로 사용하는 RNN 모델에서는 이를 normalizing 하는 것이 필요하며 이를 위해 sigmoid보다 기울기의 역전파가 더 잘되는 tanh를 사용함으로써 좋은 결과를 볼 수 있다고 한다.

https://limitsinx.tistory.com/62

[코드로 이해하는 딥러닝 2-11] - RNN(Recurrent Neural Network)/LSTM(Long-Short-Term-Memory)

[코드로 이해하는 딥러닝 0] - 글연재에 앞서 https://limitsinx.tistory.com/27 [코드로 이해하는 딥러닝 1] - Tensorflow 시작 https://limitsinx.tistory.com/28 [코드로 이해하는 딥러닝 2] - Tensorflow 변..

limitsinx.tistory.com

'Machine Learning > Theory' 카테고리의 다른 글

[개념 정리] Batch Normalization in Deep Learning - part 2. (0)	2021.05.13
[개념 정리] Batch Normalization in Deep Learning - part 1. (2)	2021.05.11

현재글RNN, LSTM 에서 tanh 를 사용하는 이유

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Computer Vision :)