Deep Learning: Chapter 10
10 Sequence Modeling: Recurrent and Recursive Nets
10.1 Unfolding Computational Graphs
\[h^{(t)} = f(h^{t-1},x^{t};\theta)\]10.2 Recurrent Neural Networks
back-propagation through time(BPTT)
10.2.1 Teacher Forcing and Networks with Output Recurrence
10.2.2 Computing the Gradient in Recurrent Neural Network
10.5 Deep Recurrent Network
Three blocks:
- from the input to the hidden state,
- from the previous hidden state to the next hidden state, and
- from the hidden state to the output
10.6 Recursive Neural Networks
10.7 The Challenge of Long-Term Dependencies
vanish or explode
10.8 Echo State Networks
10.9 Leaky Units and Other Strategies for Multiple Time Scales
10.9.1 Adding Skip Connections through Time
10.9.2 Leaky Units and a Spectrum of Different Time Scales
10.9.3 Removing Connections
10.10 The Long Short-Term Memory and Other Gated RNNs
10.10.1 LSTM
10.10.2 Other Gated RNNs
10.11 Optimization for Long-Term Dependencies
10.11.1 Clipping Gradients
10.11.2 Regularizing to Encourage Information Flow
10.12 Explicit Memory
working memory
memory network
neural Turing machine https://arxiv.org/abs/1410.5401
It is difficult to optimize functions that produce exact, integer addresses. To alleviate this problem, NTMs actually read to or write from many memory cells simultaneously. To read, they take a weighted of many cells. To write, they modify multiple cells by different amounts. The coefficients for these operations are chosen to be focused on a small number of cells.
These memory cells are typically augmented to contain a vector. There are two reasons,
- increasing cost of accessing a memory cell.
- allow for content-based addressing.