LSTM Network

A long short-term memory neural network is a type of RNN, capable of learning long-term dependencies. The difference between a classical RNN and an LSTM network is the architecture of its cell. Instead of having just one single layer, a LSTM cell has four:

  • σ 1): forget gate
  • σ 2): input gate
  • tanh: tanh layer
  • σ 3): output gate
LSTM cell architecture

A LSTM cell has two main path, called long-term state and short-term state. The short-term path begins with the data of the current state in addition with the new input data. The first layer σ 1) than calculates which part of information will kept and which will be thrown away, thats why the layer is called forget gate.

The next step decides what new information is going to be stored in the cell state. This step includes two parts. First, a sigmoid layer called input gate decides which values will be updated. Next, a tanh layer creates a vector of new values that could be added to the state. The two parts will than be combined and used for updating the state.

The last layer decides what parts of the long-term state are going to be outputted.

As a result, the cell outputs one new output, calculated with respect of it’s long-term state, as well as the updated long and short-term sates for the next calculation.

Unrolled LSTM network

LSTM networks are widely used in the deep learning space. They are particular useful in applications like time series prediction, speech recognition, language translation or robot control.

Leave a Reply

Your email address will not be published. Required fields are marked *