Jan 24, 2024
I do believe that chain of thought and graph of thought do it in some ways. But even I had the same idea, to create an architecture that does exactly this. In mamba, they achieve this by creating a hidden state which somehow compress the information.