The BLSTM features two LSTM subnetworks, one of which reads the downstream context for the current frame and the other reads the upstream context. The context consists of a given number of frames. Using both downstream and upstream data allows the BLSTM to take advantage of information both forward and back in time. The output from the two context LSTM's and the current frame itself are then fed into a regular feed-forward network.
We've fully implemented the feed-forward network and laid down the skeleton for the LSTM subnetworks.
It took some reading to figure out the topology of the LSTM subnetworks. We believe the article that is our main focus for this project ( here ) implies that all gates are connected to all input nodes and to the output of each memory cell in the same layer. Similar articles have made contradictory statements. We opt to go with this choice.
At present, we are trying to figure out how backpropagation (and to a lesser degree forwardpropagation) in the LSTM works. Graves e.a. had this to say about it:
Starting at time t1, propagate the output errors backwards through the unfolded net, using the standard BPTT equations for a softmax output layer and the crossentropy error function.There's a number of things we're not sure about at this time. Mostly we don't know exactly what is referred to as the unfolded net. We assume this refers in some way to storing the activations of all nodes for all times while forwardpropagating.
Also, the workings of BPTT (Backpropagation Through Time) are far from clear to us. There are plenty of articles on the subject on the internet, but we have yet to stumble on the one that describes the algorithm in understandable terms. Articles that show some promise of helping us get to grips are:
http://svr-www.eng.cam.ac.uk/~ajr/rnn4csr94/node14.html
http://page.mi.fu-berlin.de/~rojas/neural/chapter/K7.pdf
1 comment:
I wrote this to help people understand the details of backprop through time:
http://nicodjimenez.github.io/2014/08/08/lstm.html
Post a Comment