Thursday, June 7, 2007

Serialization

Short status update: serialization of the network has been implemented and tested.

Remaining tasks:
  • Select files to use from TIMIT
  • Convert WAV files
  • Matlab code for extracting features from WAV files
  • Matlab code for writing features into feature files
  • C++ code for reading feature files
  • C++ code for reading TIMIT annotation files
  • C++ code object skeleton for BLSTM
  • C++ code for forward propagation
  • C++ code for back propagation
  • C++ code for BLSTM trainer
  • C++ code for BLSTM tester
  • C++ code for serializing BLSTM
  • Training the network on selected files
  • Testing the network on selected files
  • Write report

Lower Case Sigma

Still, our main obstacle in the project is implementing back propagation. We've identified a few articles that describe the process best and provide the right formulas we need to reproduce it in our code. Currently, we're working with Long Short-Term Memory (Hochreiter & Schmidhueber, 1997). The article is extremely long, covers a lot of design choices for the LSTM, and most importantly, the appendix features all the formula's we'd need in a clear package.

We've come to the point that we understand how back propagation through time unrolls the network, and how each time step affects the time step before it. We were able to implement back propagation for the regular neurons in the network, the CEC, and the output gates. Unfortunately, we do not understand the formula's used for calculating the error/delta for the input gates and the forget gates (A.23 through A.26).

These formula's feature a symbol, the lower case sigma, that we're unsure how to read. As far as we're able to determine, the lower case sigma is only used in math to denote the standard deviation. Standard deviation does not seem appropriate in this formula.

Task Division

About time for a status update on the project. My apologies for not posting regularly. We've been doing a lot of coding, and the small victories/defeats you encounter while coding don't usually seem worth posting about.

We've divided the project into the following tasks:
  • Select files to use from TIMIT
  • Convert WAV files
  • Matlab code for extracting features from WAV files
  • Matlab code for writing features into feature files
  • C++ code for reading feature files
  • C++ code for reading TIMIT annotation files
  • C++ code object skeleton for BLSTM
  • C++ code for forward propagation
  • C++ code for back propagation
  • C++ code for BLSTM trainer
  • C++ code for BLSTM tester
  • C++ code for serializing BLSTM
  • Training the network on selected files
  • Testing the network on selected files
  • Write report
I've indicated the status of each task by color. Red tasks are those with which we've encountered a problem that we currently do not know how to solve. Green tasks are complete, barring potential cross-task bugs. Yellow tasks are complete, but cannot be tested because they depend on unfinished tasks. All other tasks are open, usually sketched out on paper, but not yet (fully) implemented.

The main problem areas at the moment are back propagation and selecting the WAV files to be used.

You may wonder what could possibly be so difficult about selecting files. Well, the TIMIT corpus consists of a non-regular type of WAV files, the NIST format type. Our Matlab code was tuned to use PCM WAV files. We've been able to find a good tool that converts the NIST files into usable PCM files, called sox. Conversion seems to work fine. The majority of converted files sound like we expect them to sound. A few, seemingly randomly determined, files are corrupt after conversion, garbled noise. We can't be sure whether this is a problem caused by the conversion or some problem in the file that is converted, since we are unable to listen to the original.