A new study from New York University and the Allen Institute for AI has shown that large language models called transformers can be expressed in a simple logic formalism. This finding challenges the perception that transformers are inscrutable black boxes and suggests avenues for interpreting how they work.
Transformers are a type of neural network behind major AI achievements like chatbots and language translation. They are trained on massive datasets to generate human-like text. Despite their impressive capabilities, how transformers arrive at their outputs has remained poorly understood.
The researchers proved transformers can be translated into symbolic logic sentences that replicate their function. Specifically, they showed transformers fit within a logic called first-order logic with majority quantifiers. This logic allows logical sentences with familiar constructs like "AND", "OR", and "IF-THEN", as well as majority quantifiers that check if a condition holds for over half of the elements.
While real-world transformers are complex neural networks, this study theoretically shows their reasoning can be captured by simple logical expressions. For instance, the logic could recognize patterns like "three As followed by three Bs", which transformers are known to identify.
The findings disprove the notion that transformers are inscrutable black boxes. Instead, they suggest transformers implement a form of reasoning not radically different from familiar logical formalisms. The possibility of expressing transformers in interpretable logic could enable explaining how they arrive at outputs, like detecting biases.
For business leaders deploying AI, this research opens possibilities for making transformers more transparent and accountable. It provides a path toward debugging models to avoid failures or bias. The ability to translate transformers into logical sentences could allow systematically checking if undesirable reasoning patterns occur.
Overall, this theoretical advance challenges prevailing views of transformers as hopelessly opaque. It demonstrates their thinking can be characterized in understandable logic, unlocking new ways for technologists to interpret these increasingly critical AI models. The research brings transformers closer to human-level reasoning by showing their outputs are not ineffable, but rather can be explained through logic.
Sources:
A Logic for Expressing Log-Precision Transformers
William Merrill and Ashish Sabharwal