LSTM: Learning to Forget: Continual Prediction with LSTM

Another excellent article on the subject of LSTM's, written by Gers, Schmidhuber & Cummins, addresses the problem of saturation of the CEC of a memory cell and how it can be combatted using forget gates.

Long short-term memory (LSTM; Hochreiter & Schmidhuber, 1997)can solve numerous tasks not solvable by previous learningalgorithms for recurrent neural networks (RNNs). We identifya weakness of LSTM networks processing continual input streamsthat are not a priori segmented into subsequences with explicitlymarked ends at which the network's internal state could be reset.Without resets, the state may grow indefinitely and eventuallycause the network to break down. Our remedy is a novel, adaptive"forget gate" that enables an LSTM cell to learn to reset itselfat appropriate times, thus releasing internal resources. Wereview illustrative benchmark problems on which standard LSTMoutperforms other RNN algorithms. All algorithms (includingLSTM) fail to solve continual versions of these problems. LSTMwith forget gates, however, easily solves them, and in an elegantway.

There's plenty of math included, which will be very helpful for our own implementation. The article also describes the topology of the network used in the test setup. Previously read articles were not too clear about exactly which units were connected to the input & output gates.

The seven input units are fully connected to a hidden layerconsisting of four memory blocks with 2 cells each (8 cellsand 12 gates in total). The cell outputs are fully connectedto the cell inputs, all gates, and the seven output units. Theoutput units have additional "shortcut" connections from theinput units.

Click here for the full text

- Jasper

LSTM

Thursday, April 12, 2007

Learning to Forget: Continual Prediction with LSTM

No comments:

Archive

Memory cell

Contributors