Jan 29, 2024
Even I'm confused in that, but as far as I understand, you can think of it in this way, LSTM controls the gates (size of them, allowing a certain info to pass) and that decides the flow of information, but in Mamba, the gates are fixed, but the information itself is transformed to pass through a certain gate. Now why the latter is better, I don't know.