梯度消失問題

梯度消失問題（Vanishing gradient problem）是一種機器學習中的難題，出現在以梯度下降法和反向傳播訓練人工神經網路的時候。在每次訓練的迭代中，神經網路權重的更新值與誤差函數的偏導數成比例，然而在某些情況下，梯度值會幾乎消失，使得權重無法得到有效更新，甚至神經網路可能完全無法繼續訓練。舉個例子來說明問題起因，一個傳統的激勵函數如雙曲正切函數，其梯度值在 $(-1, 1)$ 範圍內，反向傳播以鏈式法則來計算梯度。

這樣做的效果，相當於在 $n$ 層網路中，將 $n$ 個這些小數字相乘來計算「前端」層的梯度，這就使梯度（誤差信號）隨着 $n$ 呈指數遞減，導致前端層的訓練非常緩慢。

反向傳播使研究人員從頭開始訓練監督式深度人工神經網路，最初收效甚微。 1991年賽普·霍克賴特（Hochreiter）的畢業論文^[1]^[2]正式確認了「梯度消失問題」失敗的原因。梯度消失問題不僅影響多層前饋網絡，^[3]還影響循環網路。^[4]循環網路是通過將前饋網路深度展開來訓練，在網路處理的輸入序列的每個時間步驟中，都會產生一個新的層。

當所使用的激勵函數之導數可以取較大值時，則可能會遇到相關的梯度爆炸問題（exploding gradient problem）。

解決方案

多級層次結構

長短期記憶

更快的硬體

殘差網路（Residual Networks，ResNets）

其他的激活函數

其他

參考文獻

^ S. Hochreiter. Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, Institut f. Informatik, Technische Univ. Munich, 1991.
^ S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In S. C. Kremer and J. F. Kolen, editors, A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press, 2001.
^ Goh, Garrett B.; Hodas, Nathan O.; Vishnu, Abhinav. Deep learning for computational chemistry. Journal of Computational Chemistry. 2017-06-15, 38 (16): 1291–1307. PMID 28272810. arXiv:1701.04503 . doi:10.1002/jcc.24764 （英語）.
^ Pascanu, Razvan; Mikolov, Tomas; Bengio, Yoshua. On the difficulty of training Recurrent Neural Networks. 2012-11-21. arXiv:1211.5063  [cs.LG].

[1] S. Hochreiter. Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, Institut f. Informatik, Technische Univ. Munich, 1991.

[2] S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In S. C. Kremer and J. F. Kolen, editors, A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press, 2001.

[3] Goh, Garrett B.; Hodas, Nathan O.; Vishnu, Abhinav. Deep learning for computational chemistry. Journal of Computational Chemistry. 2017-06-15, 38 (16): 1291–1307. PMID 28272810. arXiv:1701.04503 . doi:10.1002/jcc.24764 （英語）.

[4] Pascanu, Razvan; Mikolov, Tomas; Bengio, Yoshua. On the difficulty of training Recurrent Neural Networks. 2012-11-21. arXiv:1211.5063  [cs.LG].

[1]

[2]

[3]

[4]