Friday, April 13, 2007

Design choices

As previously said, we'll be making an LSTM that will learn to recognize and classify phonemes in a speech signal. The network will be trained, retrained and tested on a corpus of annotated wave files. There is no reason why the same network should not be able to perform on real-time speech signal at some later junction, but that is not the focus of our project.

Our professor, Dr. Erwin M. Bakker, has acquired the TIMIT corpus, a large body of annotated speech signal. We'll be using a rounded subset of this corpus as our training data, and a smaller non-overlapping subset as our test set. Some recent articles have suggested that LSTM's can easily be retrained on different sets of speech data. If time allows, we will run some experiments using retraining.

Our intention is to use MATLAB for preprocessing and feature extraction on the wave input files, and store the resulting MFCC to file. The LSTM will be created in C++. It will use the feature file and the corresponding annotation as input. During training, the annotation will be used to teach the network. During testing, the annotation will be used to determine the performance of the network.

Over the weekend, we will be preparing a presentation that details this design.

- Jasper