Infini-Attention: Infinite Context for LLMs
Increasing the context window for LLMs has been a long struggle. Over the years we have invented a lot of techniques, but the progress has been slow and tedious. In some way, to solve the issues of memory we even engineered RAG pipelines, acting out as a semi-context window for LLMs. Context window is like a short-term memory for LLMs, the bigger the window is, the bigger context we can fit in there and thus enable a better or more nuanced answer. So, in today’s blog, we are going to look into how Google’s DeepMind invented infinite context window for LLMs.