MemGPT: The Memory Limitations of AI Systems and a Clever Technological Workaround

24.10.23 12:02 PM - Comment(s) - By Ines Almeida

Artificial intelligence systems that can have natural conversations and analyze documents have transformative business potential. However, today's AI - specifically large language models (LLMs) like Claude 2 and GPT-4 - have a major limitation. They can only remember a finite amount of information before needing to completely reset their memory. This restricts their ability to have coherent, long-term interactions or make connections across lengthy documents.

One might assume the solution is just to build LLMs with bigger memories. But LLMs face sharply diminishing returns and ballooning computational costs from naively expanding memory. After reviewing these tradeoffs, researchers at UC Berkeley devised an innovative workaround drawing inspiration from operating systems. Their system, MemGPT, applies OS principles like virtual memory and process management to unlock more powerful applications of LLMs - all while staying within their inherent memory limits.

The Core Challenge of LLM Memory Limits

LLMs use an algorithm called self-attention to analyze incoming text and predict upcoming words, just as humans intuitively continue a thought or conversation. This grants LLMs their impressive language skills. However, self-attention requires the LLM to look across all context it's received so far, which means its memory must be reset after reaching a fixed size limit.

For perspective, Claude 2 can handle about 100,000 tokens before resetting. That may sound generous compared to a 10,000 word business report. But spoken conversation can easily exceed this limit in just a few hours of steady chit-chat. Even more daunting are tasks like sifting complex legal documents that routinely run millions of tokens.

LLMs have a fixed memory capacity because the self-attention algorithm scales quadratically based on context length. Doubling the memory size makes the LLM's computations 4x more intensive. Expanding memory quickly becomes computationally infeasible, even for large tech companies.

Rather than a flaw in specific systems like Claude, this limited memory span is an inherent constraint of all modern LLM architectures. Naively expanding memory was not a viable solution path. More creative approaches would be needed.

The Insights Behind MemGPT's Operating System-Inspired Design

UC Berkeley researchers drew inspiration from operating systems like Windows that run applications working with far more data than fits into available RAM. They asked: how can we apply OS techniques to provide an LLM the illusion of infinite memory?

The result was MemGPT, which implements two key principles:

A hierarchy of memory resources - MemGPT divides memory into a small, fast "main context" like RAM and a large, slow "external context" like disk storage. Information must be explicitly transferred between them.
Process management - MemGPT handles control flow between memory, the LLM, and users akin to how an OS arbitrates between concurrent processes.

Together these give MemGPT the ability to pipeline potentially unlimited memory in and out of the LLM's limited context window as needed to accomplish tasks requiring unbounded memory over multiple processing cycles.

Just as clever OS architectures enable applications to work with more data than available RAM, MemGPT's design confers an illusion of infinite memory to fixed-context LLMs.

Conversational AI That Can Reference Years of Dialogue

A major application of LLMs is powering conversational assistants and social bots. MemGPT demonstrates substantially improved consistency and personalization in these applications:

Consistency - By querying external memory of prior interactions, MemGPT can coherently maintain facts, preferences, and history even when referring back to dialogues from months or years ago.
Personalization - MemGPT can spontaneously incorporate comprehensive knowledge about the user, like callback jokes referencing childhood stories told weeks in the past to forge greater rapport.

Analyzing Large Collections of Documents

MemGPT also excels at tasks like:

Question answering using a massive multi-document corpus like Wikipedia or a company knowledge base.
Extracting key facts and relationships by synthesizing relevant excerpts across thousands of pages.
Performing multi-hop reasoning spanning fragmented information distributed across documents.

These capabilities could greatly amplify the utility of LLMs for knowledge management applications.

Takeaways for Business Leaders

MemGPT provides two key lessons for applying LLMs:

Look beyond scaling model size, and consider architectural innovations to push capabilities forward within intrinsic limits.
Draw inspiration from solutions in fields like systems architecture - LLM memory management has parallels to longstanding CS problems.

Rather than getting caught up in an AI arms race, the clever memory architecture of MemGPT unlocks substantially more powerful applications without requiring unrealistic context sizes. Techniques like this that work within practical constraints will be key to delivering business value from AI.

Sources:

MEMGPT: Towards LLMs as Operating Systems byUC Berkeley