Still, our main obstacle in the project is implementing back propagation. We've identified a few articles that describe the process best and provide the right formulas we need to reproduce it in our code. Currently, we're working with Long Short-Term Memory (Hochreiter & Schmidhueber, 1997). The article is extremely long, covers a lot of design choices for the LSTM, and most importantly, the appendix features all the formula's we'd need in a clear package.
We've come to the point that we understand how back propagation through time unrolls the network, and how each time step affects the time step before it. We were able to implement back propagation for the regular neurons in the network, the CEC, and the output gates. Unfortunately, we do not understand the formula's used for calculating the error/delta for the input gates and the forget gates (A.23 through A.26).
These formula's feature a symbol, the lower case sigma, that we're unsure how to read. As far as we're able to determine, the lower case sigma is only used in math to denote the standard deviation. Standard deviation does not seem appropriate in this formula.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment