One of the simple intuitive explanations of how LSTM works I came across is as follows:

A LSTM network has the following three aspects that differentiate it from an usual neuron in a recurrent neural network.

1. It has control on deciding when to let the input enter the neuron.

2. It has control on deciding when to remember what was computed in the previous time step.

3. It has control on deciding when to let the output pass on to the next time stamp.

The beauty of the LSTM is that it decides all this based on the current input itself. So if you take a look at the following diagram:



The input signal x(t) at the current time stamp decides all the above 3 points. The input gate takes a decision for point 1. The forget gate takes a decision on point 2 and the output gate takes a decision on point 3. The input alone is capable of taking all these three decisions. This is inspired by how our brains work and can handle sudden context switches based on the input.


I think the GRU is just a slightly modified version of the LSTM to capture the dependencies between time instances adaptively. It just doesn't really have a memory cell and its hidden activation is just a linear interpolation of the previous hidden activation. It's diagram is as follows:

I am interested to know about other kinds of explanations.

View 4 other answers to this question
About · Careers · Privacy · Terms · Contact · Languages · Your Ad Choices · Press ·
© Quora, Inc. 2025