Thursday, June 7, 2007

Task Division

About time for a status update on the project. My apologies for not posting regularly. We've been doing a lot of coding, and the small victories/defeats you encounter while coding don't usually seem worth posting about.

We've divided the project into the following tasks:
  • Select files to use from TIMIT
  • Convert WAV files
  • Matlab code for extracting features from WAV files
  • Matlab code for writing features into feature files
  • C++ code for reading feature files
  • C++ code for reading TIMIT annotation files
  • C++ code object skeleton for BLSTM
  • C++ code for forward propagation
  • C++ code for back propagation
  • C++ code for BLSTM trainer
  • C++ code for BLSTM tester
  • C++ code for serializing BLSTM
  • Training the network on selected files
  • Testing the network on selected files
  • Write report
I've indicated the status of each task by color. Red tasks are those with which we've encountered a problem that we currently do not know how to solve. Green tasks are complete, barring potential cross-task bugs. Yellow tasks are complete, but cannot be tested because they depend on unfinished tasks. All other tasks are open, usually sketched out on paper, but not yet (fully) implemented.

The main problem areas at the moment are back propagation and selecting the WAV files to be used.

You may wonder what could possibly be so difficult about selecting files. Well, the TIMIT corpus consists of a non-regular type of WAV files, the NIST format type. Our Matlab code was tuned to use PCM WAV files. We've been able to find a good tool that converts the NIST files into usable PCM files, called sox. Conversion seems to work fine. The majority of converted files sound like we expect them to sound. A few, seemingly randomly determined, files are corrupt after conversion, garbled noise. We can't be sure whether this is a problem caused by the conversion or some problem in the file that is converted, since we are unable to listen to the original.

No comments: