Now Next Later AI - Blog #Interpretability

Why AI is Harder Than We Think: Key Takeaways for Business Leaders

Sun, 13 Aug 2023 19:50:20 +1000

Artificial intelligence (AI) has seen remarkable advances in recent years, from self-driving cars to defeating humans at games like chess and Go. However, leading AI expert Melanie Mitchell argues that achieving truly human-like "general" AI is much harder than many experts predict. Here are the key takeaways from her paper for business leaders:

Narrow AI versus general AI: Success in specialized applications like chess or image recognition does not necessarily translate to broader capabilities. We cannot assume today's AI systems are on a steady path to human-level intelligence.
"Easy" things are hard: Basic human skills like perceiving the world and carrying on a conversation have proven very difficult to replicate in machines. Conversely, AI can excel at things that are quite hard for humans.
Wishful vocabulary: Terms like "learn," "understand," and "think" are often applied to today's AI, but these systems do not have the same underlying capabilities as humans. Using human vocabulary can misleadingly imply advanced intelligence.
Intelligence is embodied: Human intelligence relies heavily on our experiences and interactions using our entire bodies, not just abstract reasoning. Attempts to achieve human-level intelligence must consider embracing human-like bodies and environments.
Common sense is key: To operate successfully in the real world, AI needs the vast background knowledge humans accumulate about how the world works. We still do not understand how to enable machines to acquire this "common sense."

Key implications:

Avoid overconfidence about timelines for achieving human-level AI based on hype and narrow successes. True general intelligence likely remains far off.
Focus investment on applications of existing AI capabilities, not attempts to replicate human thinking. Manage expectations of near-term outcomes.
Monitor advances in embodied AI and research on common sense reasoning as indicators of progress toward general AI.
Ensure AI systems have transparent workings, clear objectives, and human oversight. The orthogonality thesis that intelligence can be coupled to any goals does not hold for human-like general intelligence.

The path to human-level AI is long with much still unknown. By avoiding unfounded assumptions and acknowledging the challenges ahead, business leaders can make wise strategic decisions about how to apply AI technology today and anticipate what may come tomorrow.

Sources:

Why AI is Harder Than We Think

Melanie Mitchell

Transformers Expressible in Simple Logic

Sun, 13 Aug 2023 19:50:20 +1000

A new study from New York University and the Allen Institute for AI has shown that large language models called transformers can be expressed in a simple logic formalism. This finding challenges the perception that transformers are inscrutable black boxes and suggests avenues for interpreting how they work.

Transformers are a type of neural network behind major AI achievements like chatbots and language translation. They are trained on massive datasets to generate human-like text. Despite their impressive capabilities, how transformers arrive at their outputs has remained poorly understood.

The researchers proved transformers can be translated into symbolic logic sentences that replicate their function. Specifically, they showed transformers fit within a logic called first-order logic with majority quantifiers. This logic allows logical sentences with familiar constructs like "AND", "OR", and "IF-THEN", as well as majority quantifiers that check if a condition holds for over half of the elements.

While real-world transformers are complex neural networks, this study theoretically shows their reasoning can be captured by simple logical expressions. For instance, the logic could recognize patterns like "three As followed by three Bs", which transformers are known to identify.

The findings disprove the notion that transformers are inscrutable black boxes. Instead, they suggest transformers implement a form of reasoning not radically different from familiar logical formalisms. The possibility of expressing transformers in interpretable logic could enable explaining how they arrive at outputs, like detecting biases.

For business leaders deploying AI, this research opens possibilities for making transformers more transparent and accountable. It provides a path toward debugging models to avoid failures or bias. The ability to translate transformers into logical sentences could allow systematically checking if undesirable reasoning patterns occur.

Overall, this theoretical advance challenges prevailing views of transformers as hopelessly opaque. It demonstrates their thinking can be characterized in understandable logic, unlocking new ways for technologists to interpret these increasingly critical AI models. The research brings transformers closer to human-level reasoning by showing their outputs are not ineffable, but rather can be explained through logic.

Sources:

A Logic for Expressing Log-Precision Transformers

William Merrill and Ashish Sabharwal

DisentQA: Catching Knowledge Gaps and Avoiding Misleading Users

Sat, 12 Aug 2023 09:22:46 +1000

Imagine you ask your phone "Who wrote the song Hello by Adele?" and it gives you an incorrect answer, insisting the song is by Taylor Swift. This shows artificial intelligence sometimes confuses its own training knowledge with external facts.

Researchers want to fix this issue to make AI assistants more helpful and honest. Their solution: Build QA Systems that catch knowledge gaps and avoid misleading users by teaching the system to provide two responses:

The factual answer based on given information (e.g. Adele)
What it privately recalls from its memory (e.g. Taylor Swift)

This highlights any mismatches between its training knowledge and external data. It's like when we say "Hmm, I thought X, but the website says Y."

The team trained the AI model by creating quizzes with tricky examples:

Swapping names in passages to elicit different responses from the context vs. the model's recollection
Removing passages altogether so the system must say "I don't know"

After this special training, the model reliably distinguished its own knowledge from given facts. This improved its accuracy and truthfulness.

Say you ask about a movie release date. The system can now respond:

"The article says July 2022. But I thought it was December 2022."

This catches any knowledge gaps and avoids misleading users.

While not perfect, it's major progress toward AI that collaborates in a transparent, helpful manner. The benefits for businesses are clear:

Avoid frustrated users with incorrect responses
Build trust by exposing limitations upfront
Reduce risk from applying flawed knowledge
Clarify when external data should override internal beliefs

By recognizing and sharing when its knowledge is incomplete, the AI becomes a more reliable and honest partner. This research brings us closer to truly cooperative human-AI interaction.

Sources:

DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering

Peeking Inside the Black Box: Uncovering What AI Models Know About Books

Thu, 10 Aug 2023 08:03:22 +1000

Artificial intelligence systems like ChatGPT and GPT-4 have demonstrated impressive language skills, holding fluent conversations and answering questions on virtually any topic. But their inner workings remain largely opaque to users. These systems are "black boxes" - we know little about what knowledge they actually contain.

New research from the University of California, Berkeley sheds light on one slice of these models' knowledge: which books they have "read" and memorized. The study uncovers systematic biases in what texts AI systems know most about, with implications for how we should evaluate them.

The researchers focused specifically on works of fiction. They selected a sample of 571 English novels published between 1749 and 2020, containing literary classics along with contemporary bestsellers and award winners. The sample spanned mystery, romance, and science fiction genres as well as global Anglophone and African American literature.

For each book, the team extracted short passages of 40-60 words containing a single character name - but with the name removed. For instance, a passage from Pride and Prejudice might read "______ entered the room and greeted her hosts warmly." Humans cannot guess the missing name from such brief context. But does the AI system know the name from having read the full book?

The researchers tested two systems, ChatGPT and GPT-4, by giving each passage and asking what single-word name belongs in the blank. The accuracy of each AI model on this challenging "cloze" task revealed what books it likely memorized.

The results illuminated clear biases. Both systems strongly favor science fiction and fantasy works like Lord of the Rings and Harry Potter over other genres. They excel at classic literature like Alice in Wonderland and Pride and Prejudice but fare poorly on modern award-winning diverse books. In short, they are more knowledgeable about popular texts.

What explains this imbalance? The researchers found it closely mirrors what's most duplicated across the internet. There is a strong correlation between AI accuracy on a book and the number of verbatim passages found through Google, Bing, and other sources. The models appear to "know" books in proportion to their web popularity.

This reliance on the internet has consequences. The study showed AI systems perform better at predicting a book's publication date and summarizing its passages when they have memorized the book. In other words, their reasoning is tied to memorization - causing disparities between popular versus niche texts.

These insights matter because AI systems like ChatGPT are increasingly used for applications like analyzing literature and human culture. If their knowledge comes largely from duplicated web text, focused on popular sci-fi and fantasy, how well can we trust their judgments about less mainstream books? Their skewed knowledge could propagate biases into downstream decisions.

The findings illustrate the challenges of opaque "black box" AI systems whose training data is secret. OpenAI, which created ChatGPT and GPT-4, has not revealed what texts were used to train them. This leaves us unable to fully assess their knowledge gaps.

The researchers argue we should instead push for more transparent, open-source AI systems whose training data is public knowledge. This allows us to better understand their strengths and weaknesses - illuminated through research like this study.

As AI models grow more capable and ubiquitous, it becomes only more important to peek inside their black boxes. Understanding what knowledge they contain helps ensure we build and apply them responsibly. Analyses of what systems like ChatGPT "know" about books mark an important step toward making AI more intelligible as it continues permeating our lives.

Sources:

arxiv

The Future of AI Language Models: Making Them More Interpretable and Controllable

Thu, 10 Aug 2023 07:59:55 +1000

Artificial intelligence has made great strides in recent years, especially in natural language processing. Systems like ChatGPT and Claude can now hold impressively human-like conversations. However, a major limitation of these AI language models is that they operate like a black box - their internal workings are complex and opaque.

Researchers at Stanford have proposed a new AI architecture called Backpack that aims to fix this problem. Backpack models have an internal structure that is more interpretable and controllable compared to existing models like BERT and GPT-3.

Here's an analogy to understand how Backpack works:

Think of words as Lego blocks. Each block can be connected to other blocks in many ways to build something. Existing AI models are like throwing all the Lego pieces together in a pile - there are endless ways to combine them, but you can't understand or control the resulting structure.

A Backpack model is more like having clearly labeled Lego pieces in different bags. For each word, there are "sense vectors" that represent its different meanings and uses. When the model sees a word in a sentence, it decides which sense vectors to pull out of the bag to understand and predict that usage.

This structure offers two key benefits:

Interpretability: We can inspect the different sense vectors for a word and understand what aspects of meaning they represent. This is like looking inside the bags to see the different kinds of Lego pieces.
Control: We can directly edit the sense vectors to change the model's behavior. For example, reducing a gender-biased sense vector for the word "nurse" can reduce sexist outputs. This is like removing certain Lego pieces from a bag to change what can be built with it.

In initial tests, Backpack models matched the performance of existing models like GPT-2 while offering far more transparency. Researchers were able to do things like swap associations (so "MacBook" predicts "HP" instead of "Apple") and reduce gender bias in occupations.

The inventors stress that Backpack is still early stage research. The approach needs to be scaled up and tested across different languages and applications. But it represents an exciting step towards AI systems that are not black boxes. Instead of blindly trusting model outputs, users can interpret why it behaves in certain ways and directly edit its knowledge.

As AI becomes more powerful and ubiquitous in products and services, retaining human agency is crucial. Approaches like Backpack could make future AI not only smarter but easier to understand and actively improve. Business leaders should track developments in interpretable AI closely, as it is an important competitive differentiator down the line.

Source:

arxiv