We've divided the project into the following tasks:
- Select files to use from TIMIT
- Convert WAV files
- Matlab code for extracting features from WAV files
- Matlab code for writing features into feature files
- C++ code for reading feature files
- C++ code for reading TIMIT annotation files
- C++ code object skeleton for BLSTM
- C++ code for forward propagation
- C++ code for back propagation
- C++ code for BLSTM trainer
- C++ code for BLSTM tester
- C++ code for serializing BLSTM
- Training the network on selected files
- Testing the network on selected files
- Write report
The main problem areas at the moment are back propagation and selecting the WAV files to be used.
You may wonder what could possibly be so difficult about selecting files. Well, the TIMIT corpus consists of a non-regular type of WAV files, the NIST format type. Our Matlab code was tuned to use PCM WAV files. We've been able to find a good tool that converts the NIST files into usable PCM files, called sox. Conversion seems to work fine. The majority of converted files sound like we expect them to sound. A few, seemingly randomly determined, files are corrupt after conversion, garbled noise. We can't be sure whether this is a problem caused by the conversion or some problem in the file that is converted, since we are unable to listen to the original.
No comments:
Post a Comment