Now Next Later AI - Blog #Explainability

Unlocking the Power of Interpretable AI with InterpretML: A Guide for Business Leaders

Thu, 04 Apr 2024 12:45:20 +1100

In today's fast-paced business world, artificial intelligence (AI) has become a game-changer, enabling organizations to make data-driven decisions and gain a competitive advantage. However, as machine learning models grow more complex, the need for transparency and interpretability becomes increasingly important. InterpretML, an open-source Python package developed by Microsoft, empowers businesses to explain and understand the behavior of their AI models. In this article, we will explore the key capabilities and benefits of InterpretML, discuss its limitations when it comes to interpreting advanced language models, and delve into the current research efforts in the field of interpretability for generative AI.

Key Capabilities of InterpretML

Global and Local Explanations: InterpretML offers a comprehensive set of tools to explain model behavior at both high-level (global) and individual (local) perspectives. Global explanations provide insights into overall patterns and trends, allowing business leaders to grasp the general decision-making process of their models. Local explanations, on the other hand, focus on specific predictions, enabling a detailed analysis of individual cases. This dual approach empowers organizations to gain a holistic understanding of their AI systems.
Compatibility with Various Models: One of the standout features of InterpretML is its ability to work with a wide range of machine learning models, including decision trees, linear models, neural networks, random forests, gradient boosting machines, and support vector machines. This versatility ensures that businesses can apply interpretation techniques to their existing AI workflows while enhancing transparency and interpretability.
Feature Importance and What-If Scenarios: InterpretML provides powerful techniques to identify the most influential factors in a model's predictions. By determining the importance of different features, business leaders can gain valuable insights into the key drivers behind the model's decisions. Additionally, InterpretML can generate "what-if" scenarios, showing how changes in input features would impact the model's output. This capability allows organizations to explore different possibilities and make informed decisions based on the model's behavior.
Clear Visualizations: Effective communication is crucial when it comes to interpreting and explaining AI models. InterpretML recognizes this need and offers a range of visualization tools to present explanations in a clear and accessible manner. From feature importance plots to graphs showing the model's behavior, these visualizations help business leaders and stakeholders understand the inner workings of their AI systems without requiring deep technical expertise.

Limitations of InterpretML with Advanced Language Models

While InterpretML is a powerful tool for interpreting various types of machine learning models, it may have limitations when it comes to explaining the behavior of advanced language models, such as GPT-3, BERT, and T5. These models, known as large language models (LLMs) or transformers, are highly complex and have millions or billions of parameters. Their intricate inner workings and decision-making processes can be challenging to interpret due to their scale and complexity.

InterpretML's techniques, such as SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations), are primarily designed for interpreting more traditional machine learning models. SHAP assigns importance scores to each feature based on its contribution to the model's prediction, while LIME generates local explanations by approximating the model's behavior around a specific instance using a simpler, interpretable model. These techniques may not directly translate to the complexities of LLMs and transformers, which have more sophisticated architectures and capture nuanced patterns in natural language.

Current Research in Interpretability for Generative AI

Although InterpretML may not be the perfect fit for interpreting LLMs and transformers, the field of interpretability for advanced language models is an active area of research. Scientists and researchers are developing new techniques specifically tailored to understanding and explaining the behavior of these models. Some of the current research efforts include:

Attention Analysis: Researchers are studying the attention mechanisms of transformer models to understand which parts of the input the model focuses on during prediction. By visualizing and analyzing these attention patterns, we can gain insights into how the model processes and prioritizes information.
Probing Tasks: Designing specific tasks to test the model's understanding of language properties, such as grammar, meaning, and common sense, can help uncover the knowledge and capabilities of LLMs. These probing tasks provide a targeted evaluation of the model's behavior and decision-making process.
Perturbation-based Methods: By slightly modifying the input or internal representations of the model and observing how the outputs change, researchers can gain insights into the model's sensitivity to specific changes and its decision-making process. Perturbation-based methods help identify the most influential factors in the model's predictions.
Interpretable Architectures: Some researchers are exploring the development of new architectures for LLMs and transformers that are inherently more interpretable. By designing models with built-in interpretability mechanisms, such as attention-based explanations or modular components, we can achieve a better understanding of their inner workings.
Other Approaches: Researchers are also investigating techniques such as layer-wise relevance propagation (LRP), which assigns relevance scores to input features based on their contribution to the model's output, and integrated gradients, which attribute the model's prediction to input features by calculating the path integral of the gradients.

The Importance of Interpretability in the Age of Generative AI

As generative AI models become more prevalent and influential in various industries, the need for transparency and accountability becomes paramount. These models have the potential to generate human-like text, images, and even code, revolutionizing the way businesses operate. However, the complexity and autonomy of generative AI models also raise concerns about biased outputs or potential misuse.

Interpretability plays a crucial role in mitigating these risks and building trust in AI systems. By providing clear explanations of how models arrive at their outputs, businesses can ensure fairness, detect and address biases, and maintain ethical standards. Interpretability also enables organizations to comply with regulatory requirements and demonstrate the reasoning behind AI-driven decisions.

Key Takeaways for Business Leaders

InterpretML is a valuable tool for unlocking the power of interpretable AI in traditional machine learning models. While it may have limitations when it comes to directly interpreting advanced language models, the broader principles of interpretability and transparency remain crucial in the age of generative AI. As research in this field advances, business leaders should stay informed about the latest developments and adopt new tools and techniques that enable them to explain and understand the behavior of their AI systems. By prioritizing interpretability and transparency, organizations can build trust, mitigate risks, ensure compliance, and harness the full potential of AI technologies while maintaining ethical and responsible practices.

Why AI is Harder Than We Think: Key Takeaways for Business Leaders

Sun, 13 Aug 2023 19:50:20 +1000

Artificial intelligence (AI) has seen remarkable advances in recent years, from self-driving cars to defeating humans at games like chess and Go. However, leading AI expert Melanie Mitchell argues that achieving truly human-like "general" AI is much harder than many experts predict. Here are the key takeaways from her paper for business leaders:

Narrow AI versus general AI: Success in specialized applications like chess or image recognition does not necessarily translate to broader capabilities. We cannot assume today's AI systems are on a steady path to human-level intelligence.
"Easy" things are hard: Basic human skills like perceiving the world and carrying on a conversation have proven very difficult to replicate in machines. Conversely, AI can excel at things that are quite hard for humans.
Wishful vocabulary: Terms like "learn," "understand," and "think" are often applied to today's AI, but these systems do not have the same underlying capabilities as humans. Using human vocabulary can misleadingly imply advanced intelligence.
Intelligence is embodied: Human intelligence relies heavily on our experiences and interactions using our entire bodies, not just abstract reasoning. Attempts to achieve human-level intelligence must consider embracing human-like bodies and environments.
Common sense is key: To operate successfully in the real world, AI needs the vast background knowledge humans accumulate about how the world works. We still do not understand how to enable machines to acquire this "common sense."

Key implications:

Avoid overconfidence about timelines for achieving human-level AI based on hype and narrow successes. True general intelligence likely remains far off.
Focus investment on applications of existing AI capabilities, not attempts to replicate human thinking. Manage expectations of near-term outcomes.
Monitor advances in embodied AI and research on common sense reasoning as indicators of progress toward general AI.
Ensure AI systems have transparent workings, clear objectives, and human oversight. The orthogonality thesis that intelligence can be coupled to any goals does not hold for human-like general intelligence.

The path to human-level AI is long with much still unknown. By avoiding unfounded assumptions and acknowledging the challenges ahead, business leaders can make wise strategic decisions about how to apply AI technology today and anticipate what may come tomorrow.

Sources:

Why AI is Harder Than We Think

Melanie Mitchell

Transformers Expressible in Simple Logic

Sun, 13 Aug 2023 19:50:20 +1000

A new study from New York University and the Allen Institute for AI has shown that large language models called transformers can be expressed in a simple logic formalism. This finding challenges the perception that transformers are inscrutable black boxes and suggests avenues for interpreting how they work.

Transformers are a type of neural network behind major AI achievements like chatbots and language translation. They are trained on massive datasets to generate human-like text. Despite their impressive capabilities, how transformers arrive at their outputs has remained poorly understood.

The researchers proved transformers can be translated into symbolic logic sentences that replicate their function. Specifically, they showed transformers fit within a logic called first-order logic with majority quantifiers. This logic allows logical sentences with familiar constructs like "AND", "OR", and "IF-THEN", as well as majority quantifiers that check if a condition holds for over half of the elements.

While real-world transformers are complex neural networks, this study theoretically shows their reasoning can be captured by simple logical expressions. For instance, the logic could recognize patterns like "three As followed by three Bs", which transformers are known to identify.

The findings disprove the notion that transformers are inscrutable black boxes. Instead, they suggest transformers implement a form of reasoning not radically different from familiar logical formalisms. The possibility of expressing transformers in interpretable logic could enable explaining how they arrive at outputs, like detecting biases.

For business leaders deploying AI, this research opens possibilities for making transformers more transparent and accountable. It provides a path toward debugging models to avoid failures or bias. The ability to translate transformers into logical sentences could allow systematically checking if undesirable reasoning patterns occur.

Overall, this theoretical advance challenges prevailing views of transformers as hopelessly opaque. It demonstrates their thinking can be characterized in understandable logic, unlocking new ways for technologists to interpret these increasingly critical AI models. The research brings transformers closer to human-level reasoning by showing their outputs are not ineffable, but rather can be explained through logic.

Sources:

A Logic for Expressing Log-Precision Transformers

William Merrill and Ashish Sabharwal

DisentQA: Catching Knowledge Gaps and Avoiding Misleading Users

Sat, 12 Aug 2023 09:22:46 +1000

Imagine you ask your phone "Who wrote the song Hello by Adele?" and it gives you an incorrect answer, insisting the song is by Taylor Swift. This shows artificial intelligence sometimes confuses its own training knowledge with external facts.

Researchers want to fix this issue to make AI assistants more helpful and honest. Their solution: Build QA Systems that catch knowledge gaps and avoid misleading users by teaching the system to provide two responses:

The factual answer based on given information (e.g. Adele)
What it privately recalls from its memory (e.g. Taylor Swift)

This highlights any mismatches between its training knowledge and external data. It's like when we say "Hmm, I thought X, but the website says Y."

The team trained the AI model by creating quizzes with tricky examples:

Swapping names in passages to elicit different responses from the context vs. the model's recollection
Removing passages altogether so the system must say "I don't know"

After this special training, the model reliably distinguished its own knowledge from given facts. This improved its accuracy and truthfulness.

Say you ask about a movie release date. The system can now respond:

"The article says July 2022. But I thought it was December 2022."

This catches any knowledge gaps and avoids misleading users.

While not perfect, it's major progress toward AI that collaborates in a transparent, helpful manner. The benefits for businesses are clear:

Avoid frustrated users with incorrect responses
Build trust by exposing limitations upfront
Reduce risk from applying flawed knowledge
Clarify when external data should override internal beliefs

By recognizing and sharing when its knowledge is incomplete, the AI becomes a more reliable and honest partner. This research brings us closer to truly cooperative human-AI interaction.

Sources:

DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering

The Future of AI Language Models: Making Them More Interpretable and Controllable

Thu, 10 Aug 2023 07:59:55 +1000

Artificial intelligence has made great strides in recent years, especially in natural language processing. Systems like ChatGPT and Claude can now hold impressively human-like conversations. However, a major limitation of these AI language models is that they operate like a black box - their internal workings are complex and opaque.

Researchers at Stanford have proposed a new AI architecture called Backpack that aims to fix this problem. Backpack models have an internal structure that is more interpretable and controllable compared to existing models like BERT and GPT-3.

Here's an analogy to understand how Backpack works:

Think of words as Lego blocks. Each block can be connected to other blocks in many ways to build something. Existing AI models are like throwing all the Lego pieces together in a pile - there are endless ways to combine them, but you can't understand or control the resulting structure.

A Backpack model is more like having clearly labeled Lego pieces in different bags. For each word, there are "sense vectors" that represent its different meanings and uses. When the model sees a word in a sentence, it decides which sense vectors to pull out of the bag to understand and predict that usage.

This structure offers two key benefits:

Interpretability: We can inspect the different sense vectors for a word and understand what aspects of meaning they represent. This is like looking inside the bags to see the different kinds of Lego pieces.
Control: We can directly edit the sense vectors to change the model's behavior. For example, reducing a gender-biased sense vector for the word "nurse" can reduce sexist outputs. This is like removing certain Lego pieces from a bag to change what can be built with it.

In initial tests, Backpack models matched the performance of existing models like GPT-2 while offering far more transparency. Researchers were able to do things like swap associations (so "MacBook" predicts "HP" instead of "Apple") and reduce gender bias in occupations.

The inventors stress that Backpack is still early stage research. The approach needs to be scaled up and tested across different languages and applications. But it represents an exciting step towards AI systems that are not black boxes. Instead of blindly trusting model outputs, users can interpret why it behaves in certain ways and directly edit its knowledge.

As AI becomes more powerful and ubiquitous in products and services, retaining human agency is crucial. Approaches like Backpack could make future AI not only smarter but easier to understand and actively improve. Business leaders should track developments in interpretable AI closely, as it is an important competitive differentiator down the line.

Source:

arxiv

Tracking Political Bias from Data to Models to Decisions

Thu, 10 Aug 2023 07:57:52 +1000

Recent advances in AI, especially large language models like GPT-4 and Claude 2, have unlocked new capabilities in generating text and speech. However, these models are still "sponges" that absorb patterns, including potential societal biases, from their training data. A new study from the University of Washington digs into an important question - can political bias in the data propagate to the models and affect downstream decisions? Their findings highlight risks that businesses should be aware of when deploying these technologies.

The researchers focused on political leanings across two dimensions - social values (liberal vs conservative) and economic values (left vs right). First, they evaluated the inherent biases of 14 major language models by analyzing their responses to statements from a standard political compass test. The models occupied a range of positions across the political spectrum, with BERT variants being more conservative and GPT models more libertarian.

Next, they examined if further pretraining these models on partisan news and social media corpora leads them to shift their political stances. The results show that left-leaning data induces liberal shifts, while right-leaning data causes conservative shifts. However, the overall shifts were small, suggesting inherent biases persist.

Finally, they tested if these biased models perform differently on social-impact tasks like hate speech and misinformation detection. While overall performance was similar, models exhibited double standards - left-leaning ones were more sensitive to offensive speech targeting minorities but overlooked attacks on dominant groups. Right-leaning models showed the opposite.

For business leaders, these findings highlight risks of unintended bias and unfairness creeping into AI systems built on large language models. While some bias is inevitable given the training data, being aware of its extent and impact can help inform ethical AI practices. Companies should proactively probe for biases, use diverse evaluation datasets, and leverage different perspectives through ensemble approaches.

Tracking bias from data to models to decisions is vital for ensuring AI transparency and accountability.

Source:

Arxiv

Understanding How AI Generates Images from Text

Tue, 08 Aug 2023 16:59:39 +1000

Recent advances in AI image generation have led to impressive results, with systems like Stable Diffusion able to create highly realistic images from simple text prompts. But while these models can generate remarkably detailed pictures, they remain something of a black box. We don't have much insight into how they actually convert the text into pixel representations.

New research from computer scientists is helping peel back the curtain on these AI systems. In a paper titled "What the DAAM: Interpreting Stable Diffusion Using Cross Attention", researchers propose a method called DAAM (Diffusion Attentive Attribution Maps) to analyze how words in a prompt influence different parts of the generated image.

DAAM creates heat maps showing which pixels are most related to each word in the text prompt. For example, for a prompt like "a blue bird flying", the word "blue" would highlight the blue parts of the bird, "bird" would highlight the full bird, and "flying" would highlight the motion blurred wings and body.

By aggregating attention scores between text and image patches across the AI model's layers, DAAM produces interpretable maps linking words to visual features. The researchers validated DAAM by testing how well it can perform noun segmentation, a common computer vision benchmark where the goal is to identify the regions in an image that correspond to noun objects. DAAM achieved competitive scores, despite having no explicit training.

Experiments with DAAM revealed new insights about these generative AI systems:

The relationships between words in a prompt translate to visual relationships in the image. For example, verbs like "flying" encapsulate their subjects like "bird".
Using similar words like "giraffe" and "zebra" leads to worse image generation, likely because their features get entangled. The DAAM maps heavily overlap for such co-hyponyms. Essentially, co-hyponyms are words that share the same superclass or category but are not synonyms. They have a sort of "sibling" status as distinct members of their parent group. Giraffe and zebra are co-hyponyms. They both belong to the hypernym "wild animals".
Descriptive adjectives like "blue" attend too broadly across the whole image, suggesting objects are entangled with their surroundings. Changing the adjective modifies the entire scene.

For business leaders, research like DAAM is important because it improves the explainability of AI systems. As generative models become more ubiquitous, understanding how they operate will help identify limitations and better assess risks. Models that entangle features more could potentially suffer from bias or produce unrealistic outputs.

DAAM also demonstrates how attention mechanisms in AI models can be repurposed for interpretation, without retraining the models from scratch. This allows transparent analysis without compromising performance.

Overall, DAAM represents an impactful step toward explainable AI in generative models. Demystifying these systems will be key as businesses increasingly look to utilize powerful generative AI capabilities in their products and processes. Interpretability helps ensure these technologies are trustworthy and dependable.

Sources:

arxiv