10 Sequence Modeling: Recurrent and Recursive Nets

10.1 Unfolding Computational Graphs

$h^{(t)} = f(h^{t-1},x^{t};\theta)$

10.2 Recurrent Neural Networks

back-propagation through time(BPTT)

10.5 Deep Recurrent Network

Three blocks:

• from the input to the hidden state,
• from the previous hidden state to the next hidden state, and
• from the hidden state to the output

10.7 The Challenge of Long-Term Dependencies

vanish or explode

10.11 Optimization for Long-Term Dependencies

10.12 Explicit Memory

working memory

memory network

neural Turing machine https://arxiv.org/abs/1410.5401

It is difficult to optimize functions that produce exact, integer addresses. To alleviate this problem, NTMs actually read to or write from many memory cells simultaneously. To read, they take a weighted of many cells. To write, they modify multiple cells by different amounts. The coefficients for these operations are chosen to be focused on a small number of cells.

These memory cells are typically augmented to contain a vector. There are two reasons,

• increasing cost of accessing a memory cell.