LSTM: 04/12/07

Welcome!

Welcome to our blog!

We are Timo de Vries and Jasper A. Visser, students at the Leiden Institute of Advanced Computer Science (LIACS). As part of the seminar Speech Recognition, we are implementing a neural network solution to classify phonemes from speech input.

Specifically, we have chosen to use a Long Short-Term Memory recurrent network, which has been proven to do well in this particular area. Over the course of the next months, we will be updating this blog with articles we have found to be useful, progress updates on our project, and other useful information.

- Jasper

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

For the Speech Recognition seminar, we have read some articles about phoneme recognition. The above article gave us the idea to build a Long Short Term Memory network that can be trained to classify phonemes accurately.

In this paper, we present bidirectional Long Short Term Memory (LSTM) networks, and a modified, full gradient version of the LSTM learning algorithm. We evaluate Bidirectional LSTM (BLSTM) and several other network architectures on the benchmark task of framewise phoneme classification, using the TIMIT database. Our main findings are that bidirectional networks outperform unidirectional ones, and Long Short Term Memory (LSTM) is much faster and also more accurate than both standard Recurrent Neural Nets (RNNs) and time-windowed Multilayer Perceptrons (MLPs). Our results support the view that contextual information is crucial to speech processing, and suggest that BLSTM is an effective architecture with which to exploit it.

The article can be found here.

- Timo

Learning to Forget: Continual Prediction with LSTM

Another excellent article on the subject of LSTM's, written by Gers, Schmidhuber & Cummins, addresses the problem of saturation of the CEC of a memory cell and how it can be combatted using forget gates.

Long short-term memory (LSTM; Hochreiter & Schmidhuber, 1997)can solve numerous tasks not solvable by previous learningalgorithms for recurrent neural networks (RNNs). We identifya weakness of LSTM networks processing continual input streamsthat are not a priori segmented into subsequences with explicitlymarked ends at which the network's internal state could be reset.Without resets, the state may grow indefinitely and eventuallycause the network to break down. Our remedy is a novel, adaptive"forget gate" that enables an LSTM cell to learn to reset itselfat appropriate times, thus releasing internal resources. Wereview illustrative benchmark problems on which standard LSTMoutperforms other RNN algorithms. All algorithms (includingLSTM) fail to solve continual versions of these problems. LSTMwith forget gates, however, easily solves them, and in an elegantway.

There's plenty of math included, which will be very helpful for our own implementation. The article also describes the topology of the network used in the test setup. Previously read articles were not too clear about exactly which units were connected to the input & output gates.

The seven input units are fully connected to a hidden layerconsisting of four memory blocks with 2 cells each (8 cellsand 12 gates in total). The cell outputs are fully connectedto the cell inputs, all gates, and the seven output units. Theoutput units have additional "shortcut" connections from theinput units.

Click here for the full text

- Jasper

LSTM

Thursday, April 12, 2007

Welcome!

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Learning to Forget: Continual Prediction with LSTM

Archive

Memory cell

Contributors