In recent years, artificial intelligence has taken great leaps forward thanks to large language models (LLMs) - AI systems trained on massive amounts of text data that can understand language and generate human-like text. Companies like Google, Microsoft, and startups like OpenAI and Anthropic have invested heavily in developing ever-larger LLMs with billions or even trillions of parameters.
However, once these giant LLMs are trained, companies face a dilemma - whether to "fine-tune" the model by further training it on specific tasks, or keep the model "frozen" without any changes. Fine-tuning allows the LLM to specialize and achieve state-of-the-art performance on specialized tasks. But it comes at a high cost - computationally expensive retraining, reduced versatility, and forgetting of previous capabilities.
In their research paper, AI21 Labs demonstrates that frozen LLMs have untapped potential that can match or exceed fine-tuning approaches, without these downsides. They present three new techniques to effectively "stand on the shoulders" of frozen giants:
1. Input-Dependent Prompt Tuning
Large language models are adept at understanding natural language, but they don't automatically know how to perform specific tasks like answering questions or summarizing text.However, their capabilities can be unlocked using prompt tuning.
The key idea behind prompt tuning is that providing the right prompt text before the input steers the language model towards the desired task.It's like giving the model instructions on how to process the upcoming input.
For example, if we want the language model to answer questions based on a passage of text, we can prepend the input with a prompt like:
"Answer the following question based only on the passage below:"
[Text Passage]
[Question]
This tunes the model to approach the upcoming input as a question answering task.The prompt acts like an adapter, steering the versatile model to useful behaviors without any training or fine-tuning.
So prompt tuning just means optimizing the wording of these instruction prompts for each task to get the best performance from the frozen language model.It's like learning how to most effectively communicate with and direct the model.
The key innovation from AI21 Labs was making prompt tuning input-dependent. Rather than using one static prompt per task, they trained a small neural network to generate custom prompts tailored to each specific input.
This input-dependent prompting allowed a single frozen language model to master over 100 diverse tasks, from question answering to summarization to sentiment analysis, matching extensive fine-tuning without degradation.
The prompts serve as lightweight yet powerful steering instructions that can specialize a frozen model on the fly based on the input.It's like having a dynamic adapter that configures the model differently for each unique situation.
2. Huge Frozen Readers for Question Answering
In open-domain question answering, the AI system must answer questions by finding relevant information from a massive collection of text passages, like Wikipedia.
Typically, these systems use a smaller "reader" model to read through the relevant passages and figure out the answer. That's because even the largest language models can only process a limited amount of text at once.
But smaller reader models have less knowledge and reasoning ability than giant language models with billions or trillions of parameters. So they don't fully unlock the potential of these frozen giants.
AI21 Labs tackled this by adding a "re-ranking" stage to condense the most important information from the passages into a condensed form that fits into the giant frozen language model.
This allowed their 17 billion parameter model to read enough of the relevant context to match specialized reader models that were extensively fine-tuned for question answering.
In essence, the smaller re-ranking model acts like a search engine, retrieving and condensing the most useful knowledge to fit the limitations of the frozen giant.
This gives the huge frozen model access to all the relevant information it needs to apply its powerful reasoning abilities. The giants' knowledge and capabilities can be tapped without fine-tuning that risks degrading other skills.
It demonstrates how frozen language models have untapped potential that can be unlocked with the right surrounding components, like the re-ranking stage here. Their true capabilities can be accessed without resorting to extensive fine-tuning.
3. Recursive Application of a Single LLM
Typically, large language models are used to process an input query just once before generating an output response. The model reads the input, does its internal reasoning, and returns a single output.
But AI21 Labs found that recursively applying the model on its own outputs can actually improve performance. Essentially, the model refines and enhances its initial output by processing it again.
It's like having the model double-check its own work and refine its initial response. Humans often re-read what we initially wrote to improve the wording and fix errors. Recursively applying language models does something similar, but in an automated way.
To implement this, AI21 built a small 2-layer neural network "connector" that feeds the language model's output back into its input.
So the model first processes the original query as normal. But then the connector passes the model's initial output back into it as the new input. This triggers it to refine and enhance that initial output.
In tests for question answering, just two recursive passes through a 7 billion parameter model allowed it to match the performance of a much larger 17 billion parameter model.
Essentially, it nearly doubled the capabilities of the smaller model by re-applying it recursively. This shows how recursive application unlocks additional performance without requiring even larger pretrained models.
The connector module creates a feedback loop, allowing the model to re-process its own output and correct errors or improve phrasing, much like a human would. This technique amplifies the capabilities of a given model without expensive retraining or fine-tuning.
Business Implications
These techniques enable building capable AI systems on top of a single, frozen pretrained LLM instead of an array of specialized fine-tuned models. This offers tangible business benefits:
- Cost Savings - Avoiding expensive training of multiple large models cuts costs. Just maintaining and serving one frozen LLM backbone provides economies of scale.
- Simplicity - Relying on prompting and other external components is far simpler than intricately fine-tuning models. Less specialized engineering effort is required.
- Flexibility - New capabilities can be added without interfering with existing ones. Fine-tuning risks degradation on previous tasks.
- Efficiency - Recursive passing allows improving performance on-demand by re-applying the LLM only when beneficial. Bigger pretrained models must be applied to all inputs.
While fine-tuning revolutionized AI, endless model growth is impractical. Frozen language models present an alluring path forward - unlocking their full potential with the right neural "plug-ins" provides a scalable approach to building production AI systems.