<?xml version="1.0" encoding="UTF-8" ?><!-- generator=Zoho Sites --><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><atom:link href="https://www.nownextlater.ai/Insights/gen-ai-research/feed" rel="self" type="application/rss+xml"/><title>Now Next Later AI - Blog , Gen AI Research</title><description>Now Next Later AI - Blog , Gen AI Research</description><link>https://www.nownextlater.ai/Insights/gen-ai-research</link><lastBuildDate>Wed, 26 Nov 2025 21:24:41 +1100</lastBuildDate><generator>http://zoho.com/sites/</generator><item><title><![CDATA[Language Model Tokenization Reveals Significant Disparities Across Languages: Implications for Businesses and Users]]></title><link>https://www.nownextlater.ai/Insights/post/language-model-tokenization-reveals-significant-disparities-across-languages-implications-for-busine</link><description><![CDATA[<img align="left" hspace="5" src="https://www.nownextlater.ai/Screenshot 2024-04-29 at 12.25.09 pm.png"/>In this article, we'll dive into a recent study that uncovers substantial disparities in the tokenization process used by language models across different languages.]]></description><content:encoded><![CDATA[<div class="zpcontent-container blogpost-container "><div data-element-id="elm_C3_ooyGQRiyFng1ZLDBhOw" data-element-type="section" class="zpsection "><style type="text/css"></style><div class="zpcontainer-fluid zpcontainer"><div data-element-id="elm_k3QeQrRoTvSjKQeOk60gvg" data-element-type="row" class="zprow zprow-container zpalign-items- zpjustify-content- " data-equal-column=""><style type="text/css"></style><div data-element-id="elm_u8BcqDWMTWSgs5iEhH5Usg" data-element-type="column" class="zpelem-col zpcol-12 zpcol-md-12 zpcol-sm-12 zpalign-self- "><style type="text/css"></style><div data-element-id="elm_F6-dDNDmBOusriRjt3xREQ" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_F6-dDNDmBOusriRjt3xREQ"] .zpimage-container figure img { width: 500px ; height: 564.58px ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_F6-dDNDmBOusriRjt3xREQ"] .zpimage-container figure img { width:500px ; height:564.58px ; } } @media (max-width: 767px) { [data-element-id="elm_F6-dDNDmBOusriRjt3xREQ"] .zpimage-container figure img { width:500px ; height:564.58px ; } } [data-element-id="elm_F6-dDNDmBOusriRjt3xREQ"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="center" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-center zpimage-size-medium zpimage-tablet-fallback-medium zpimage-mobile-fallback-medium hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/Screenshot%202024-04-29%20at%2012.25.09%E2%80%AFpm.png" width="500" height="564.58" loading="lazy" size="medium" alt="Premiums with respect to En- glish on FLORES-200 for several English- centric models." data-lightbox="true"/></picture></span><figcaption class="zpimage-caption zpimage-caption-align-center"><span class="zpimage-caption-content">Premiums with respect to En- glish on FLORES-200 for several English- centric models.</span></figcaption></figure></div>
</div><div data-element-id="elm_Ol3GZWqPS1quBc9elQjJng" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_Ol3GZWqPS1quBc9elQjJng"].zpelem-text { border-radius:1px; } </style><div class="zptext zptext-align-center " data-editor="true"><div style="color:inherit;text-align:left;"><div style="color:inherit;text-align:left;">In this article, we'll dive into a recent <a href="https://arxiv.org/pdf/2305.15425" title="study" rel="">study</a> that uncovers substantial disparities in the tokenization process used by language models across different languages. These disparities have significant implications for businesses and users, affecting the cost, latency, and quality of service when using AI-powered language technologies. By understanding these issues, business leaders can make more informed decisions about the adoption and deployment of language models and advocate for the development of more equitable solutions. <br></div></div><div style="color:inherit;text-align:left;"><br><p><span style="font-family:&quot;Archivo Black&quot;, sans-serif;">The Importance of Tokenization in Language Models&nbsp;</span></p><p><br></p><p>Tokenization is the process of breaking down natural language text into smaller units called tokens, which are then used as input for language models. The choice of tokenization method can significantly impact a model's performance and efficiency. Subword tokenization, which breaks down complex words into smaller parts, has become the preferred approach for state-of-the-art language models.</p><p><br></p><p>However, the study revealed that even subword tokenization methods can lead to significant disparities in the number of tokens required to represent the same content across different languages. This has far-reaching consequences for businesses and users relying on language models for various applications.</p><p><span style="font-family:&quot;Archivo Black&quot;, sans-serif;"><br></span></p><p><span style="font-family:&quot;Archivo Black&quot;, sans-serif;">Tokenization Disparities Across Languages&nbsp;</span></p><p><br></p><p>The researchers analyzed the tokenization process of several popular language models, including GPT-2, RoBERTa, and the tokenizers used by ChatGPT and GPT-4. They found that the number of tokens required to represent the same text can vary drastically across languages. For example:</p><ol><li>GPT-2 requires 3 times more tokens to represent the same content in Japanese compared to English.</li><li>The ChatGPT and GPT-4 tokenizers use 1.6 times more tokens for Italian, 2.6 times more for Bulgarian, and 3 times more for Arabic compared to English.</li><li>For Shan, a language spoken in Myanmar, the difference can be as high as 15 times compared to English.</li></ol><p><br></p><p>These disparities persist even in tokenizers specifically designed for multilingual support, with some language pairs showing a 4-fold difference in the number of tokens required.</p><p><span style="font-family:&quot;Archivo Black&quot;, sans-serif;"><br></span></p><p><span style="font-family:&quot;Archivo Black&quot;, sans-serif;">Implications for Businesses and Users&nbsp;</span></p><p><br></p><p>The tokenization disparities across languages have significant implications for businesses and users:</p><ol><li>Cost: Many commercial language model services charge users per token. As a result, users of certain languages may end up paying significantly more for the same task compared to users of English or other more efficiently tokenized languages.</li><li>Latency: The number of tokens directly impacts the processing time for a task. Languages with longer tokenized representations can experience twice the latency compared to English, which may be critical for real-time applications like customer support or emergency services.</li><li>Long Context Processing: Language models often have a fixed context window, limiting the amount of text they can process at once. Users of more efficiently tokenized languages can work with much longer texts compared to users of languages with higher token counts, potentially leading to significant disparities in the quality of service.</li></ol><p><br></p><p><span style="font-family:&quot;Archivo Black&quot;, sans-serif;">The Path Forward: Multilingual Tokenization Fairness&nbsp;</span></p><p><br></p><p>To address these disparities and ensure more equitable access to language technologies, the researchers propose the concept of multilingual tokenization fairness. They argue that tokenizers should produce similar encoded lengths for the same content across languages. This can be achieved by:</p><ol><li>Recognizing that subword tokenization is necessary to achieve parity, as character-level and byte-level representations cannot fully address the issue.</li><li>Ensuring that tokenizers support all Unicode codepoints to handle characters from all languages.</li><li>Building a multilingually fair parallel corpus for training and evaluating tokenizers, with balanced representation of topics, named entities, and diverse translations.</li><li>Developing multilingually fair tokenizers by first training individual monolingual tokenizers for each target language and then merging them while maintaining parity.</li></ol><p>By adopting these principles, language model developers can create more equitable tokenizers that provide similar levels of service across languages, benefiting businesses and users worldwide.</p><p><br></p><p>As language models become increasingly integral to our daily lives, it is crucial that we prioritize fairness and inclusivity in their design and deployment. By understanding the implications of tokenization disparities and taking action to address them, business leaders can play a vital role in shaping a more equitable future for AI-powered language technologies.</p></div><p></p></div>
</div><div data-element-id="elm_BzWdTrdv9UYRWiOUuFgGVw" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_BzWdTrdv9UYRWiOUuFgGVw"] .zpimage-container figure img { width: 500px ; height: 500.00px ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_BzWdTrdv9UYRWiOUuFgGVw"] .zpimage-container figure img { width:500px ; height:500.00px ; } } @media (max-width: 767px) { [data-element-id="elm_BzWdTrdv9UYRWiOUuFgGVw"] .zpimage-container figure img { width:500px ; height:500.00px ; } } [data-element-id="elm_BzWdTrdv9UYRWiOUuFgGVw"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="center" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-center zpimage-size-medium zpimage-tablet-fallback-medium zpimage-mobile-fallback-medium "><figure role="none" class="zpimage-data-ref"><a class="zpimage-anchor" href="/introduction-to-large-language-models-for-business-leaders-book" target="" rel=""><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/12.png" width="500" height="500.00" loading="lazy" size="medium" alt="Introduction to LLMs for Leaders"/></picture></a></figure></div>
</div></div></div></div></div></div> ]]></content:encoded><pubDate>Mon, 29 Apr 2024 12:28:39 +1000</pubDate></item><item><title><![CDATA[AI Benchmarks: Misleading Measures of Progress Towards General Intelligence]]></title><link>https://www.nownextlater.ai/Insights/post/ai-benchmarks-misleading-measures-of-progress-towards-general-intelligence</link><description><![CDATA[<img align="left" hspace="5" src="https://www.nownextlater.ai/william-warby-WahfNoqbYnM-unsplash.jpg"/>It is crucial for business leaders to understand the limitations and potential pitfalls of current approaches to measuring AI capabilities.]]></description><content:encoded><![CDATA[<div class="zpcontent-container blogpost-container "><div data-element-id="elm_QMKQxPeqSOuvZQ6CtSLZHA" data-element-type="section" class="zpsection "><style type="text/css"></style><div class="zpcontainer-fluid zpcontainer"><div data-element-id="elm_D6o9TG7ESGKOVcNmCn5FZQ" data-element-type="row" class="zprow zprow-container zpalign-items- zpjustify-content- " data-equal-column=""><style type="text/css"></style><div data-element-id="elm_Yt-fFrzuRD6qgDY-psDazw" data-element-type="column" class="zpelem-col zpcol-12 zpcol-md-12 zpcol-sm-12 zpalign-self- "><style type="text/css"> [data-element-id="elm_Yt-fFrzuRD6qgDY-psDazw"].zpelem-col{ border-radius:1px; } </style><div data-element-id="elm_5pR4jRceDONydu3ax9lzaQ" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_5pR4jRceDONydu3ax9lzaQ"] .zpimage-container figure img { width: 1090px ; height: 817.50px ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_5pR4jRceDONydu3ax9lzaQ"] .zpimage-container figure img { width:723px ; height:542.25px ; } } @media (max-width: 767px) { [data-element-id="elm_5pR4jRceDONydu3ax9lzaQ"] .zpimage-container figure img { width:415px ; height:311.25px ; } } [data-element-id="elm_5pR4jRceDONydu3ax9lzaQ"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="center" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-center zpimage-size-fit zpimage-tablet-fallback-fit zpimage-mobile-fallback-fit hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/william-warby-WahfNoqbYnM-unsplash.jpg" width="415" height="311.25" loading="lazy" size="fit" alt="Photo by William Warby on Unsplash" data-lightbox="true"/></picture></span></figure></div>
</div><div data-element-id="elm_Huuir9jowc4M5lofYufhaA" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_Huuir9jowc4M5lofYufhaA"].zpelem-text { border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><div style="color:inherit;"><p style="font-weight:400;text-indent:0px;">Artificial intelligence (AI) has made remarkable strides in recent years, with AI systems now achieving impressive performance on a variety of tasks, from image recognition to language understanding. These advancements have been largely driven by the development of powerful machine learning algorithms, coupled with the availability of vast amounts of training data and computational resources.</p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">However, as AI continues to progress, it is crucial for business leaders to understand the limitations and potential pitfalls of current approaches to measuring AI capabilities. A position <a href="https://arxiv.org/abs/2111.15366" title="paper" rel="">paper</a> by Raji et al. offers a compelling critique of popular AI benchmarks, arguing that they are often misleading and fail to capture meaningful progress towards general intelligence. This critique is further echoed in a recent TechCrunch <a href="https://techcrunch.com/2024/03/07/heres-why-most-ai-benchmarks-tell-us-so-little/" title="article" rel="">article</a> by Kyle Wiggers, which highlights the disconnect between AI benchmarks and real-world applications.</p></div>
<p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;"><span style="font-family:&quot;Oswald&quot;, sans-serif;font-size:26px;color:rgb(41, 77, 135);">The Allure of &quot;General&quot; AI Benchmarks</span></p><div style="color:inherit;"><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">Two of the most widely cited benchmarks in AI are ImageNet, used for evaluating image recognition systems, and GLUE (General Language Understanding Evaluation), used for assessing natural language processing models. These benchmarks have taken on an outsized role in the AI community, with performance on these tasks often seen as indicative of progress towards general AI capabilities.</p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">The appeal of these benchmarks is understandable. They offer a standardized way to compare different AI systems and track improvements over time. Moreover, the tasks they encompass, such as identifying objects in images or understanding the meaning of sentences, seem to capture essential aspects of intelligence that humans excel at.</p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">However, as Raji et al. point out, these benchmarks are far from perfect measures of general intelligence. In fact, they argue, the focus on achieving state-of-the-art performance on these narrow tasks has distorted the priorities of the AI research community and led to an overemphasis on benchmark-chasing at the expense of more meaningful progress.</p></div>
<p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;"><span style="color:rgb(41, 77, 135);font-size:26px;font-family:&quot;Oswald&quot;, sans-serif;">The Limitations of Current Benchmarks</span></p><div style="color:inherit;"><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">One of the key criticisms leveled by Raji et al. is that the tasks included in popular AI benchmarks are often arbitrary and not systematically chosen to represent general capabilities. They compare this to a fictional children's story about a museum claiming to contain &quot;everything in the whole wide world,&quot; but which actually just contains a haphazard collection of random objects.</p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">Similarly, the authors argue, benchmarks like ImageNet and GLUE are composed of a relatively narrow and idiosyncratic set of tasks that hardly capture the full range of intelligent behaviors. Impressive performance on these tasks is often taken as evidence of general intelligence, when in reality it may simply reflect a system's ability to exploit specific patterns or statistical regularities present in the training data.</p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">The TechCrunch article by Wiggers further underscores this point, noting that many of the most commonly used benchmarks for chatbot-powering AI models, such as GPQA (&quot;A Graduate-Level Google-Proof Q&amp;A Benchmark&quot;), contain questions that are far removed from the everyday tasks most people use these models for, such as responding to emails or writing cover letters. As Jesse Dodge, a scientist at the Allen Institute for AI, puts it, &quot;Benchmarks are typically static and narrowly focused on evaluating a single capability, like a model's factuality in a single domain, or its ability to solve mathematical reasoning multiple choice questions.&quot;</p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">Another issue highlighted in both the Raji et al. paper and the TechCrunch article is the presence of errors and flaws in some widely used benchmarks. For example, an analysis of the HellaSwag benchmark, designed to evaluate commonsense reasoning in AI models, found that more than a third of the test questions contained typos and nonsensical writing. Similarly, the MMLU benchmark, which has been touted by vendors like Google, OpenAI, and Anthropic as evidence of their models' logical reasoning abilities, contains questions that can be solved through mere memorization rather than genuine understanding.</p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">As David Widder, a postdoctoral researcher at Cornell studying AI and ethics, notes in the TechCrunch article, &quot;A model can't [reason through and solve new and complex problems] either&quot; just because it performs well on benchmarks like MMLU. Instead, he argues, these benchmarks often test a model's ability to &quot;memoriz[e] and associat[e] two keywords together&quot; rather than truly understand causal mechanisms.</p></div>
<p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;"><span style="font-family:&quot;Oswald&quot;, sans-serif;font-size:26px;color:rgb(41, 77, 135);">Key Takeaways for Business Leaders</span></p><p style="font-weight:400;text-indent:0px;"><br></p><div style="color:inherit;"><p style="font-weight:400;text-indent:0px;">Given the limitations and potential misleading nature of current AI benchmarks, what should business leaders keep in mind when evaluating AI technologies? Here are some key takeaways from the Raji et al. paper and the TechCrunch article:</p><ol><li>Be skeptical of grand claims about AI systems achieving human-level or superhuman intelligence based solely on benchmark performance. As both sources emphasize, impressive results on specific benchmarks do not necessarily translate to general intelligence or robustness in real-world deployments.</li><li>When evaluating AI vendors or technologies, look beyond top-line benchmark numbers. Ask detailed questions about the specific capabilities and limitations of the system, and how it has been tested on tasks and datasets relevant to your business needs.</li><li>Encourage a culture of rigorous, multifaceted evaluation within your organization's AI initiatives. Rather than focusing solely on chasing state-of-the-art benchmark results, prioritize detailed error analysis, bias auditing, and stress testing across a diverse range of scenarios.</li><li>Support research and development efforts aimed at creating more meaningful and comprehensive benchmarks tied to real-world applications. This could include developing industry-specific datasets and evaluation protocols that better reflect the challenges and requirements of your business domain.</li><li>Foster an AI research culture that values creativity, diversity of thought, and long-term progress over short-term benchmark wins. Encourage your teams to explore novel architectures and approaches, even if they may not immediately yield chart-topping results.</li></ol></div>
<br><div style="color:inherit;"><p style="font-weight:400;text-indent:0px;"><span style="font-family:&quot;Oswald&quot;, sans-serif;font-size:26px;color:rgb(41, 77, 135);">Looking Ahead: Improving AI Benchmarks</span></p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">Both the Raji et al. paper and the TechCrunch article offer some suggestions for improving the current state of AI benchmarks. One key idea is to incorporate more human evaluation alongside automated benchmarks. As Jesse Dodge suggests in the TechCrunch piece, &quot;The right path forward, here, is a combination of evaluation benchmarks with human evaluation—prompting a model with a real user query and then hiring a person to rate how good the response is.&quot;</p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">David Widder, on the other hand, is less optimistic about the potential for improving existing benchmarks. Instead, he argues that AI evaluation should focus more on the downstream impacts of these models and whether those impacts align with the goals and values of the people affected by them. &quot;I'd ask which specific contextual goals we want AI models to be able to be used for,&quot; he says, &quot;and evaluate whether they'd be—or are— successful in such contexts.&quot;</p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">As AI continues to advance and become more deeply integrated into business operations, it is crucial for leaders to have a nuanced understanding of the technologies' strengths and limitations. By looking beyond simplistic benchmark results and embracing a more holistic and rigorous approach to AI evaluation, organizations can make more informed decisions and unlock the true potential of artificial intelligence while mitigating its risks and pitfalls.</p></div>
<p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">Footnotes: <br></p><div style="color:inherit;"><div><div><div><div><div><ul><li><span style="font-size:14px;"><span style="font-weight:500;font-family:&quot;Questrial&quot;, sans-serif;">&quot;<a href="https://arxiv.org/abs/2111.15366" title="AI and the Everything in the Whole Wide World Benchmark" rel="">AI and the Everything in the Whole Wide World Benchmark</a>&quot; by </span></span><span style="font-size:14px;font-weight:500;font-family:&quot;Questrial&quot;, sans-serif;">Inioluwa Deborah Raji, </span><span style="font-family:&quot;Questrial&quot;, sans-serif;font-size:14px;"><span style="font-weight:500;">Emily M. Bender, Amandalynne Paullada, Emily Denton, and Alex Hanna <br></span></span></li><li>&quot;<a href="https://techcrunch.com/2024/03/07/heres-why-most-ai-benchmarks-tell-us-so-little/" title="Why most AI benchmarks tell us so little" rel="">Why most AI benchmarks tell us so little</a>&quot; by Kyle Wiggers for TechCrunch</li></ul></div>
</div></div></div></div></div></div></div></div><div data-element-id="elm_BnRa5OKVdYxRsf6Mr2akog" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_BnRa5OKVdYxRsf6Mr2akog"] .zpimage-container figure img { width: 500px ; height: 500.00px ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_BnRa5OKVdYxRsf6Mr2akog"] .zpimage-container figure img { width:500px ; height:500.00px ; } } @media (max-width: 767px) { [data-element-id="elm_BnRa5OKVdYxRsf6Mr2akog"] .zpimage-container figure img { width:500px ; height:500.00px ; } } [data-element-id="elm_BnRa5OKVdYxRsf6Mr2akog"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="center" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-center zpimage-size-medium zpimage-tablet-fallback-medium zpimage-mobile-fallback-medium "><figure role="none" class="zpimage-data-ref"><a class="zpimage-anchor" href="/responsible-ai-in-the-age-of-generative-models-ai-governance-ethics-and-risk-management" target="" rel=""><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/Navy%20and%20Blue%20Modern%20We%20Provide%20Business%20Solutions%20Facebook%20Ad%20-1200%20x%201200%20px-.png" width="500" height="500.00" loading="lazy" size="medium"/></picture></a></figure></div>
</div><div data-element-id="elm_uFX8p-I0RPOxatVN-X-I4A" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_uFX8p-I0RPOxatVN-X-I4A"].zpelem-text { border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><p><span style="color:inherit;">Photo by William Warby on Unsplash</span></p></div>
</div></div></div></div></div></div> ]]></content:encoded><pubDate>Wed, 03 Apr 2024 10:42:30 +1100</pubDate></item><item><title><![CDATA[Microsoft Unveils AutoGen to Revolutionize Conversational AI Apps]]></title><link>https://www.nownextlater.ai/Insights/post/microsoft-unveils-autogen-to-revolutionize-conversational-ai-apps</link><description><![CDATA[<img align="left" hspace="5" src="https://www.nownextlater.ai/Screenshot 2023-10-24 at 2.11.08 pm.png"/>To accelerate development of advanced conversational AI applications, Microsoft recently introduced AutoGen, an open-source Python library that streamlines orchestrating multi-agent conversations.]]></description><content:encoded><![CDATA[<div class="zpcontent-container blogpost-container "><div data-element-id="elm_vluLiypRQ1WldbTa2CB0vQ" data-element-type="section" class="zpsection "><style type="text/css"></style><div class="zpcontainer-fluid zpcontainer"><div data-element-id="elm_28uh3DKcSR6PkovV8o0_bA" data-element-type="row" class="zprow zprow-container zpalign-items- zpjustify-content- " data-equal-column=""><style type="text/css"></style><div data-element-id="elm_Q2f-wp5qQ6mc103US_-dog" data-element-type="column" class="zpelem-col zpcol-12 zpcol-md-12 zpcol-sm-12 zpalign-self- "><style type="text/css"></style><div data-element-id="elm_gvP8gNdFPIABM8v-y8leWw" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_gvP8gNdFPIABM8v-y8leWw"] .zpimage-container figure img { width: 1090px ; height: 564.00px ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_gvP8gNdFPIABM8v-y8leWw"] .zpimage-container figure img { width:723px ; height:374.10px ; } } @media (max-width: 767px) { [data-element-id="elm_gvP8gNdFPIABM8v-y8leWw"] .zpimage-container figure img { width:415px ; height:214.74px ; } } [data-element-id="elm_gvP8gNdFPIABM8v-y8leWw"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="center" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-center zpimage-size-fit zpimage-tablet-fallback-fit zpimage-mobile-fallback-fit hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/Screenshot%202023-10-24%20at%202.11.08%20pm.png" width="415" height="214.74" loading="lazy" size="fit" alt="Autogen" data-lightbox="true"/></picture></span></figure></div>
</div><div data-element-id="elm_07A53auvQcSoWa59skzEHQ" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_07A53auvQcSoWa59skzEHQ"].zpelem-text { border-radius:1px; } </style><div class="zptext zptext-align-center " data-editor="true"><div style="color:inherit;text-align:left;"><div style="color:inherit;text-align:left;"><div style="color:inherit;text-align:left;"><p style="font-weight:400;text-indent:0px;">Conversational artificial intelligence (AI) is transforming numerous industries by enabling more natural interactions between humans and computers. From virtual assistants to chatbots, voice interfaces, and avatars, conversational AI is becoming increasingly prevalent in everyday digital experiences. However, building the complex workflows that power these next-generation systems remains challenging for most companies.</p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">To accelerate development of advanced conversational AI applications, Microsoft recently introduced AutoGen, an open-source Python library that streamlines orchestrating multi-agent conversations. With AutoGen's customizable and intelligent agents, developers can readily construct sophisticated conversational systems and workflows using combinations of AI, tools, and human inputs.</p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;"><span style="font-family:&quot;Oswald&quot;, sans-serif;font-size:16px;">Democratizing Complex Conversational AI Workflows</span></p><p style="font-weight:400;text-indent:0px;"></p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">A key goal of AutoGen is democratizing the creation of intricate conversational AI applications. Traditionally, building multi-turn workflows involving several AI components has required extensive engineering expertise and effort. AutoGen encapsulates the complexity behind easy-to-use agents and interfaces.</p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">Some examples of applications enabled by AutoGen:</p><ul style="margin-left:40px;"><li>Tutoring systems where students converse with an AI tutor that can call an expert for help when needed</li><li>Troubleshooting chatbots that propose solutions, execute tools, and incorporate human feedback</li><li>Interactive fiction games with conversational NPCs powered by AI and humans</li><li>Data analysis workflows where users discuss options with an AI assistant that runs code and queries databases</li></ul><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">With AutoGen's pre-built agents and simple API, developers can set up the conversational 'cast' and interactions for their application in just a few lines of Python code. The complexity of conversing, remembering context, integrating tools, handling errors, and supporting dynamic multi-agent chatter happens automatically behind the scenes.</p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;"><span style="font-family:&quot;Oswald&quot;, sans-serif;"></span></p><p style="font-weight:400;text-indent:0px;"><span style="font-family:&quot;Oswald&quot;, sans-serif;font-size:16px;">AutoGen Agents - Conversational Building Blocks</span></p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">At the core of AutoGen are customizable agents that can chat with each other and humans to solve problems. There are two key types of agents:</p><ul style="margin-left:40px;"><li>Assistant agents provide domain expertise using large language models like GPT-3.5 and GPT-4. They can be configured with instructions and knowledge for different roles.</li><li>User proxy agents act on behalf of humans. They can request inputs, execute tools through code, or take other custom actions.</li></ul><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">By combining these agents into multi-agent systems, developers can construct auto- mated workflows with flexible human involvement. Agents exchange messages until they mutually determine the conversation has achieved its goal.</p><p style="font-weight:400;text-indent:0px;">For instance, an assistant agent might propose an analytical approach while the user proxy agent runs simulations to validate the idea before reporting results back to the assistant. AutoGen streamlines the intricacies of conversation management so developers simply define the agents and their interactions.</p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;"><span style="font-family:&quot;Oswald&quot;, sans-serif;font-size:16px;">Maximizing Value from Large Language Models</span></p><p style="font-weight:400;text-indent:0px;"></p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">In addition to simplifying complex workflows, AutoGen also includes features to maximize the value derived from expensive large language model APIs like OpenAI's.</p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">AutoGen helps users:</p><ul style="margin-left:40px;"><li>Fine-tune model hyperparameters like temperature, presence penalty, and stop sequences to optimize for metrics like accuracy, cost, etc.</li><li>Cache model outputs to avoid redundant expensive calls.</li><li>Automatically handle errors and retries to improve reliability.</li><li>Seamlessly blend outputs from multiple model configurations.</li></ul><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">Tools like these ensure users efficiently tap into the vast capabilities of large language models through a robust interface.</p><p style="font-weight:400;text-indent:0px;">Microsoft is particularly focused on responsible and ethical standards for AutoGen. They incorporated algorithmic techniques to provide transparency and maintain human oversight over any automated conversations between agents.</p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;"><span style="font-family:&quot;Oswald&quot;, sans-serif;"></span></p><p style="font-weight:400;text-indent:0px;"><span style="font-family:&quot;Oswald&quot;, sans-serif;font-size:16px;">Empowering a New Generation of AI Applications</span></p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">AutoGen tackles a common pain point in leveraging today's most advanced AI capabilities: the burdensome process of coordinating multiple conversational AI components. With its blend of simple abstractions and powerful features, AutoGen opens the door to new categories of AI applications:</p><ul style="margin-left:40px;"><li>Medical chatbots that discuss patient cases with doctors before synthesizing expert advice</li><li>Multi-modal VR agents that converse with users and AI assistants while manipulating 3D environments</li><li>Interactive fiction games with dialogue trees branching based on player choices and AI improvisation</li><li>Data science workflows where users explore models through natural language conversations with AutoGen agents</li></ul><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">AutoGen represents an important step in making sophisticated AI more accessible. Its potential to unlock new products and experiences makes AutoGen one of the most exciting recent developments in conversational AI.</p><p style="font-weight:400;text-indent:0px;"><br></p><div style="color:inherit;"><p style="font-weight:400;text-indent:0px;"><span style="font-family:&quot;Oswald&quot;, sans-serif;font-size:16px;">Key Takeaways for Business Leaders</span></p><p style="font-weight:400;text-indent:0px;"></p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">For business leaders, AutoGen represents an opportunity to leverage conversational AI in new ways across customer engagement, operations, employee productivity, and more. Companies that leverage AutoGen early could gain a competitive advantage in their ability to rapidly deploy innovative conversational experiences.</p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">AutoGen is an enabling technology that can help businesses adopt conversational AI at scale by making development drastically easier. Its potential to unlock new products and efficiencies makes it a platform business leaders should have on their radar.</p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">Sources:</p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;"><span style="color:inherit;"><a href="https://arxiv.org/pdf/2308.08155.pdf" title="AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation" rel="">AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation</a><br></span></p><p style="font-weight:400;text-indent:0px;"><span style="color:inherit;"><a href="https://microsoft.github.io/autogen/docs/Getting-Started" title="Autogen" rel="">Autogen</a><br></span></p><p style="font-weight:400;text-indent:0px;"></p><p style="font-weight:400;text-indent:0px;"><a href="https://microsoft.github.io/"><span style="color:inherit;"><br></span></a></p></div><p style="font-weight:400;text-indent:0px;"></p><p style="font-weight:400;text-indent:0px;"><br></p></div></div></div><p></p></div>
</div></div></div></div></div></div> ]]></content:encoded><pubDate>Tue, 24 Oct 2023 14:13:24 +1100</pubDate></item><item><title><![CDATA[MemGPT: The Memory Limitations of AI Systems and a Clever Technological Workaround]]></title><link>https://www.nownextlater.ai/Insights/post/memgpt-using-operating-system-concepts-to-unlock-the-potential-of-large-language-models</link><description><![CDATA[<img align="left" hspace="5" src="https://www.nownextlater.ai/fredy-jacob-t0SlmanfFcg-unsplash.jpg"/>MemGPT, applies OS principles like virtual memory and process management to unlock more powerful applications of LLMs - all while staying within their inherent memory limits.]]></description><content:encoded><![CDATA[<div class="zpcontent-container blogpost-container "><div data-element-id="elm_6KOImqMKTvmSQvnw-si5SA" data-element-type="section" class="zpsection "><style type="text/css"></style><div class="zpcontainer-fluid zpcontainer"><div data-element-id="elm_7-0PUdmhRWGjcvnBzQt1UQ" data-element-type="row" class="zprow zprow-container zpalign-items- zpjustify-content- " data-equal-column=""><style type="text/css"></style><div data-element-id="elm_lor5cj6VTIGjCq0bsxHbqg" data-element-type="column" class="zpelem-col zpcol-12 zpcol-md-12 zpcol-sm-12 zpalign-self- "><style type="text/css"></style><div data-element-id="elm_xMqGWPgee3VC7ipsoT-lag" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_xMqGWPgee3VC7ipsoT-lag"] .zpimage-container figure img { width: 1090px ; height: 613.13px ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_xMqGWPgee3VC7ipsoT-lag"] .zpimage-container figure img { width:723px ; height:406.69px ; } } @media (max-width: 767px) { [data-element-id="elm_xMqGWPgee3VC7ipsoT-lag"] .zpimage-container figure img { width:415px ; height:233.44px ; } } [data-element-id="elm_xMqGWPgee3VC7ipsoT-lag"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="center" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-center zpimage-size-fit zpimage-tablet-fallback-fit zpimage-mobile-fallback-fit hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/fredy-jacob-t0SlmanfFcg-unsplash.jpg" width="415" height="233.44" loading="lazy" size="fit" alt="memory" data-lightbox="true"/></picture></span></figure></div>
</div><div data-element-id="elm_FoVkbOwNRI-FolvQ_xcnJQ" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_FoVkbOwNRI-FolvQ_xcnJQ"].zpelem-text { border-radius:1px; } </style><div class="zptext zptext-align-center " data-editor="true"><div style="color:inherit;text-align:left;"><div style="color:inherit;text-align:left;"><div style="color:inherit;"><p style="font-weight:400;text-indent:0px;">Artificial intelligence systems that can have natural conversations and analyze documents have transformative business potential. However, today's AI - specifically large language models (LLMs) like Claude 2 and GPT-4 - have a major limitation. They can only remember a finite amount of information before needing to completely reset their memory. This restricts their ability to have coherent, long-term interactions or make connections across lengthy documents.</p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">One might assume the solution is just to build LLMs with bigger memories. But LLMs face sharply diminishing returns and ballooning computational costs from naively expanding memory. After reviewing these tradeoffs, researchers at UC Berkeley devised an innovative workaround drawing inspiration from operating systems. Their system, MemGPT, applies OS principles like virtual memory and process management to unlock more powerful applications of LLMs - all while staying within their inherent memory limits.</p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;"><span style="font-family:&quot;Oswald&quot;, sans-serif;font-size:16px;">The Core Challenge of LLM Memory Limits</span></p><p style="font-weight:400;text-indent:0px;"></p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">LLMs use an algorithm called self-attention to analyze incoming text and predict upcoming words, just as humans intuitively continue a thought or conversation. This grants LLMs their impressive language skills. However, self-attention requires the LLM to look across all context it's received so far, which means its memory must be reset after reaching a fixed size limit.</p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">For perspective, Claude 2 can handle about 100,000 tokens before resetting. That may sound generous compared to a 10,000 word business report. But spoken conversation can easily exceed this limit in just a few hours of steady chit-chat. Even more daunting are tasks like sifting complex legal documents that routinely run millions of tokens.</p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">LLMs have a fixed memory capacity because the self-attention algorithm scales quadratically based on context length. Doubling the memory size makes the LLM's computations 4x more intensive. Expanding memory quickly becomes computationally infeasible, even for large tech companies.</p><p style="font-weight:400;text-indent:0px;">Rather than a flaw in specific systems like Claude, this limited memory span is an inherent constraint of all modern LLM architectures. Naively expanding memory was not a viable solution path. More creative approaches would be needed.</p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;"><span style="font-family:&quot;Oswald&quot;, sans-serif;font-size:16px;">The Insights Behind MemGPT's Operating System-Inspired Design</span></p><p style="font-weight:400;text-indent:0px;"></p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">UC Berkeley researchers drew inspiration from operating systems like Windows that run applications working with far more data than fits into available RAM. They asked: how can we apply OS techniques to provide an LLM the illusion of infinite memory?</p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">The result was MemGPT, which implements two key principles:</p><ol style="margin-left:40px;"><li>A hierarchy of memory resources - MemGPT divides memory into a small, fast &quot;main context&quot; like RAM and a large, slow &quot;external context&quot; like disk storage. Information must be explicitly transferred between them.</li><li>Process management - MemGPT handles control flow between memory, the LLM, and users akin to how an OS arbitrates between concurrent processes.</li></ol><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">Together these give MemGPT the ability to pipeline potentially unlimited memory in and out of the LLM's limited context window as needed to accomplish tasks requiring unbounded memory over multiple processing cycles.</p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">Just as clever OS architectures enable applications to work with more data than available RAM, MemGPT's design confers an illusion of infinite memory to fixed-context LLMs.</p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;"><span style="font-family:&quot;Oswald&quot;, sans-serif;font-size:16px;">Conversational AI That Can Reference Years of Dialogue</span></p><p style="font-weight:400;text-indent:0px;"></p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">A major application of LLMs is powering conversational assistants and social bots. MemGPT demonstrates substantially improved consistency and personalization in these applications:</p><ul><ul><ul><li>Consistency - By querying external memory of prior interactions, MemGPT can coherently maintain facts, preferences, and history even when referring back to dialogues from months or years ago.</li><li>Personalization - MemGPT can spontaneously incorporate comprehensive knowledge about the user, like callback jokes referencing childhood stories told weeks in the past to forge greater rapport.</li></ul></ul></ul><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;"><span style="font-family:&quot;Oswald&quot;, sans-serif;font-size:16px;">Analyzing Large Collections of Documents</span></p><p style="font-weight:400;text-indent:0px;"></p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">MemGPT also excels at tasks like:</p><ul style="margin-left:40px;"><li>Question answering using a massive multi-document corpus like Wikipedia or a company knowledge base.</li><li>Extracting key facts and relationships by synthesizing relevant excerpts across thousands of pages.</li><li>Performing multi-hop reasoning spanning fragmented information distributed across documents.</li></ul><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">These capabilities could greatly amplify the utility of LLMs for knowledge management applications.</p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;"><span style="font-family:&quot;Oswald&quot;, sans-serif;font-size:16px;">Takeaways for Business Leaders</span></p><p style="font-weight:400;text-indent:0px;"></p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">MemGPT provides two key lessons for applying LLMs:</p><ol style="margin-left:40px;"><li>Look beyond scaling model size, and consider architectural innovations to push capabilities forward within intrinsic limits.</li><li>Draw inspiration from solutions in fields like systems architecture - LLM memory management has parallels to longstanding CS problems.</li></ol><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">Rather than getting caught up in an AI arms race, the clever memory architecture of MemGPT unlocks substantially more powerful applications without requiring unrealistic context sizes. Techniques like this that work within practical constraints will be key to delivering business value from AI.</p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;">Sources:</p><p style="font-weight:400;text-indent:0px;"><span style="color:inherit;"><a href="https://arxiv.org/pdf/2310.08560.pdf" title="MEMGPT: Towards LLMs as Operating systems " rel="">MEMGPT: Towards LLMs as Operating Systems</a></span> by<span style="color:inherit;"></span><span style="color:inherit;">UC Berkeley</span></p><p style="font-weight:400;text-indent:0px;"></p><p style="font-weight:400;text-indent:0px;"><br></p><p style="font-weight:400;text-indent:0px;"><br></p></div>
</div></div><p></p></div></div><div data-element-id="elm_aFrCMWYjwCTopYJ58uM7Sw" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_aFrCMWYjwCTopYJ58uM7Sw"] .zpimage-container figure img { width: 800px ; height: 344.00px ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_aFrCMWYjwCTopYJ58uM7Sw"] .zpimage-container figure img { width:500px ; height:215.00px ; } } @media (max-width: 767px) { [data-element-id="elm_aFrCMWYjwCTopYJ58uM7Sw"] .zpimage-container figure img { width:500px ; height:215.00px ; } } [data-element-id="elm_aFrCMWYjwCTopYJ58uM7Sw"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="center" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-center zpimage-size-large zpimage-tablet-fallback-large zpimage-mobile-fallback-large "><figure role="none" class="zpimage-data-ref"><a class="zpimage-anchor" href="/aibooks" target="" rel=""><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/Untitled%20design%20-4-.png" width="500" height="215.00" loading="lazy" size="large"/></picture></a></figure></div>
</div></div></div></div></div></div> ]]></content:encoded><pubDate>Tue, 24 Oct 2023 12:02:02 +1100</pubDate></item><item><title><![CDATA[Is GPT-4 a Mixture of Experts Model? Exploring MoE Architectures for Language Models]]></title><link>https://www.nownextlater.ai/Insights/post/is-gpt-4-a-mixture-of-experts-model-exploring-moe-architectures-for-language-models</link><description><![CDATA[Rumors are swirling that GPT-4 may use an advanced technique called Mixture of Experts (MoE) to achieve over 1 tr parameters. This offers an opportunity to demystify MoE]]></description><content:encoded><![CDATA[<div class="zpcontent-container blogpost-container "><div data-element-id="elm_l-rxaOxTSYujeWk2-vZfMw" data-element-type="section" class="zpsection "><style type="text/css"></style><div class="zpcontainer-fluid zpcontainer"><div data-element-id="elm_xFH57oOkRPim79EfxOAuUg" data-element-type="row" class="zprow zprow-container zpalign-items- zpjustify-content- " data-equal-column=""><style type="text/css"></style><div data-element-id="elm_42e7Ken5TQirB4Tf08O0Jg" data-element-type="column" class="zpelem-col zpcol-12 zpcol-md-12 zpcol-sm-12 zpalign-self- "><style type="text/css"></style><div data-element-id="elm_khPg25WU59_le2ZHOQnl4g" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_khPg25WU59_le2ZHOQnl4g"] .zpimage-container figure img { width: 500px ; height: 229.84px ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_khPg25WU59_le2ZHOQnl4g"] .zpimage-container figure img { width:500px ; height:229.84px ; } } @media (max-width: 767px) { [data-element-id="elm_khPg25WU59_le2ZHOQnl4g"] .zpimage-container figure img { width:500px ; height:229.84px ; } } [data-element-id="elm_khPg25WU59_le2ZHOQnl4g"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="center" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-center zpimage-size-medium zpimage-tablet-fallback-medium zpimage-mobile-fallback-medium hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/Screenshot%202023-08-17%20at%202.15.32%20pm.png" width="500" height="229.84" loading="lazy" size="medium" alt="A sample of related models" data-lightbox="true"/></picture></span></figure></div>
</div><div data-element-id="elm_wydQABEFSfq69jt59vZzKw" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_wydQABEFSfq69jt59vZzKw"].zpelem-text { border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><p><span style="color:inherit;">Rumors are swirling that GPT-4 may use an advanced technique called Mixture of Experts (MoE) to achieve over 1 trillion parameters. Although unconfirmed, these reports offer an opportunity to demystify MoE and explore why this architecture could allow the next generation of language models to efficiently scale to unprecedented size.<br><br><span style="font-family:&quot;Oswald&quot;, sans-serif;">What is Mixture of Experts? </span><br><br>In most AI systems, a single model is applied to all inputs. But MoE models have groups of smaller &quot;expert&quot; models, each with their own parameters. For every new input, an expert selector chooses the most relevant experts to process that data.<br><br>This means only a sparse subset of the total parameters are activated per input. So MoE models can pack in exponentially more parameters without a proportional explosion in computation.<br><br>For language tasks, some experts specialize in grammar, others learn factual knowledge, allowing MoE models to better handle the nuances of natural language. The selector dynamically routes each word to the best combination of experts.<br><br>So while an MoE model may contain trillions of total parameters via its many experts, only a tiny fraction need to be used for any given input. This allows unprecedented scale while maintaining efficiency.<br><br><span style="font-family:&quot;Oswald&quot;, sans-serif;">Pioneering MoE to Power Language AI</span><br><br>The core concept of MoE dates back decades, but only recently has progress in model parallelism and distributed training enabled its application to large language models. <br><br>Google has published notable results using MoE to achieve huge language models:<br><br></span></p><p style="margin-left:40px;"><span style="color:inherit;">1) <span style="font-family:&quot;Oswald&quot;, sans-serif;"><a href="https://arxiv.org/pdf/2101.03961.pdf" title="Switch Transformers" rel="">Switch Transformers</a></span> simplify MoE routing strategies. In experiments, they attain up to 8x faster training versus dense models on language tasks by intelligently allocating computation.</span></p><p style="margin-left:40px;"></p><p style="margin-left:40px;"><span style="color:inherit;"><br></span></p><p style="margin-left:40px;"><span style="color:inherit;">2) <span style="font-family:&quot;Oswald&quot;, sans-serif;"><a href="https://arxiv.org/abs/2112.06905" title="GLaM" rel="">GLaM</a></span> leverages MoE to reach 1.2 trillion parameters. With just 8% of its weights active per input, it outperforms the 175 billion parameter GPT-3 on multiple language benchmarks. <br></span></p><p style="margin-left:40px;"></p><p style="margin-left:40px;"><span style="color:inherit;"><br></span></p><p>Between these two projects, we see MoE enables order-of-magnitude leaps in model capacity, capability, and efficiency. If GPT-4 utilizes MoE to hit 1+ trillion parameters as speculated, it suggests OpenAI has engineered solutions for training and deployment that overcome key scaling barriers.</p><p><span style="font-family:&quot;Oswald&quot;, sans-serif;"><br>The Upshot for Business Leaders <br></span></p><p><span style="font-family:&quot;Oswald&quot;, sans-serif;"><br></span></p><p>MoE presents a disruptive path to building AI systems with previously unfathomable levels of knowledge and versatility. Leveraging these capabilities productively and safely will require deep consideration.</p><p><br></p><p>As this technology continues advancing, business leaders should stay cognizant of developments in MoE and large language models, and keep in mind the following:</p><ul><li>MoE enables <span style="text-decoration:underline;">exponential gains in model capacity at constant computational cost</span> - expect rapid leaps in language AI.</li><li>Specialized experts <span style="text-decoration:underline;">can encode robust knowledge</span> - anticipate AI that is far more competent and wide-ranging. </li><li>However, <span style="text-decoration:underline;">risks rise</span> with capability - plan to implement strong controls and oversight for safety.</li></ul><p><br></p><p>While the details of GPT-4 remain unconfirmed, its scale may soon demonstrate the vast possibilities of MoE in language AI, for better or worse. A wise, measured approach to deploying such technology will be vital.</p></div>
</div><div data-element-id="elm_pzYYuSSKNULiHvI7QLl4zg" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_pzYYuSSKNULiHvI7QLl4zg"] .zpimage-container figure img { width: 800px ; height: 344.00px ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_pzYYuSSKNULiHvI7QLl4zg"] .zpimage-container figure img { width:500px ; height:215.00px ; } } @media (max-width: 767px) { [data-element-id="elm_pzYYuSSKNULiHvI7QLl4zg"] .zpimage-container figure img { width:500px ; height:215.00px ; } } [data-element-id="elm_pzYYuSSKNULiHvI7QLl4zg"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="center" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-center zpimage-size-large zpimage-tablet-fallback-large zpimage-mobile-fallback-large "><figure role="none" class="zpimage-data-ref"><a class="zpimage-anchor" href="/aibooks" target="" rel=""><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/Untitled%20design%20-4-.png" width="500" height="215.00" loading="lazy" size="large"/></picture></a></figure></div>
</div></div></div></div></div></div> ]]></content:encoded><pubDate>Thu, 17 Aug 2023 14:25:20 +1000</pubDate></item><item><title><![CDATA[Automating Common Sense for AI With Ensemble Models]]></title><link>https://www.nownextlater.ai/Insights/post/automating-common-sense-for-ai-with-ensemble-models</link><description><![CDATA["Symbolic knowledge distillation" that automates common sense acquisition for AI.]]></description><content:encoded><![CDATA[<div class="zpcontent-container blogpost-container "><div data-element-id="elm_mPKN0rjCQVyuVjArx-vFGA" data-element-type="section" class="zpsection "><style type="text/css"></style><div class="zpcontainer-fluid zpcontainer"><div data-element-id="elm_veGBOJUFSYK4lnARV0N_ow" data-element-type="row" class="zprow zprow-container zpalign-items- zpjustify-content- " data-equal-column=""><style type="text/css"></style><div data-element-id="elm_IvhbYywqQbujJzuyoys2bA" data-element-type="column" class="zpelem-col zpcol-12 zpcol-md-12 zpcol-sm-12 zpalign-self- "><style type="text/css"></style><div data-element-id="elm_LYnK6WfuYB-C6ntSF3eAew" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_LYnK6WfuYB-C6ntSF3eAew"] .zpimage-container figure img { width: 500px ; height: 394.38px ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_LYnK6WfuYB-C6ntSF3eAew"] .zpimage-container figure img { width:500px ; height:394.38px ; } } @media (max-width: 767px) { [data-element-id="elm_LYnK6WfuYB-C6ntSF3eAew"] .zpimage-container figure img { width:500px ; height:394.38px ; } } [data-element-id="elm_LYnK6WfuYB-C6ntSF3eAew"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="center" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-center zpimage-size-medium zpimage-tablet-fallback-medium zpimage-mobile-fallback-medium hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/Screenshot%202023-08-16%20at%2011.35.08%20am.png" width="500" height="394.38" loading="lazy" size="medium" alt="Symbolic knowledge distillation" data-lightbox="true"/></picture></span></figure></div>
</div><div data-element-id="elm_wtOfCPoVSESEnAf3dmqoag" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_wtOfCPoVSESEnAf3dmqoag"].zpelem-text { border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><p><span style="color:inherit;">Artificial intelligence (AI) systems still lack true understanding of the world and rely heavily on training data provided by humans. An ongoing challenge is developing AI with more generalized common sense - basic knowledge about how the world works that humans acquire through experience.&nbsp;</span></p><p><span style="color:inherit;"><br></span></p><p><span style="color:inherit;">Researchers have proposed compiling common sense into knowledge graphs - structured collections of facts. But these require extensive manual effort to create and often have gaps. Now, scientists at the University of Washington and the Allen Institute for AI have demonstrated a new technique called &quot;symbolic knowledge distillation&quot; that automates common sense acquisition for AI. Their method transfers knowledge from a large, general AI model into a specialized common sense model, without direct human authoring.<br><br>The researchers used GPT-3, a leading natural language AI model from OpenAI, as the knowledge source. GPT-3 was prompted to generate common sense inferences about everyday scenarios, creating a knowledge graph called ATOMIC10x with 10 times more entries than human-authored versions. This automatic approach achieved greater scale and diversity of common sense than manual authoring.<br><br>To improve the accuracy of the AI-generated knowledge, the researchers trained a separate &quot;critic&quot; model to filter out incorrect inferences. With this critic, ATOMIC10x attained over 96% accuracy in human evaluations, surpassing 86.8% for human-authored graphs. The knowledge graph both exceeded humans in quantity and matched quality.<br><br>The researchers then trained a compact common sense model called COMET on the ATOMIC10x graph. Remarkably, this smaller COMET model outperformed its massive GPT-3 teacher in generating accurate common sense inferences. It also improved on models trained with human-written knowledge graphs.<br><br>This demonstrates an alternative pipeline - from machine-generated data to specialized AI models - that can exceed human capabilities for common sense acquisition. The researchers propose that humans can play a more focused role as critics, rather than manually authoring entire knowledge bases.<br><br>The new distillation technique paves the way for more capable AI assistants, chatbots, and robots that understand implicit rules of everyday situations. Common sense helps AI converse naturally, perform physical tasks, and make logical inferences about causality and human behavior. Automating common sense at scale remains a grand challenge for human-like artificial intelligence.<br><br>This research exemplifies how large AI models like GPT-3 can transfer knowledge to more specialized applications through automatic generation. While general models have limitations in narrowly defined tasks, their broad learning makes them valuable teachers. Distillation techniques focus that broad knowledge into optimized models for specific needs like common sense.<br><br>Business leaders should track such advances that make AI more generally capable and useful across applications. Automating the acquisition of common sense can complement training data curated by humans, reducing manual bottlenecks. AI models endowed with common sense hold promise for everything from chatbots to autonomous systems to creative applications. While current methods are imperfect, rapid progress is being made - foreshadowing AI assistants that understand the world more like we do.</span></p><p><span style="color:inherit;"><br></span></p><p><span style="color:inherit;">Sources:</span></p><p><span style="color:inherit;"><a href="https://arxiv.org/abs/2110.07178" title="Symbolic Knowledge Distillation: from General Language Models to Commonsense Models" rel="">Symbolic Knowledge Distillation: from General Language Models to Commonsense Models</a></span></p><p></p></div>
</div></div></div></div></div></div> ]]></content:encoded><pubDate>Wed, 16 Aug 2023 11:38:51 +1000</pubDate></item><item><title><![CDATA[Enhancing AI's Compositional Language Skills]]></title><link>https://www.nownextlater.ai/Insights/post/enhancing-ai-s-compositional-language-skills</link><description><![CDATA[Enhancing AI's Compositional Language Skills]]></description><content:encoded><![CDATA[<div class="zpcontent-container blogpost-container "><div data-element-id="elm_Ag9lOtL8TDaPl-p8m7SaIA" data-element-type="section" class="zpsection "><style type="text/css"></style><div class="zpcontainer-fluid zpcontainer"><div data-element-id="elm_S7Dlm9VTR92NhgNiuAFiPw" data-element-type="row" class="zprow zprow-container zpalign-items- zpjustify-content- " data-equal-column=""><style type="text/css"></style><div data-element-id="elm_5stMruKbRsmF702-Ogmm0Q" data-element-type="column" class="zpelem-col zpcol-12 zpcol-md-12 zpcol-sm-12 zpalign-self- "><style type="text/css"></style><div data-element-id="elm_a1CmfiNpzvnL4RC9yR0LIw" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_a1CmfiNpzvnL4RC9yR0LIw"] .zpimage-container figure img { width: 1090px ; height: 467.34px ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_a1CmfiNpzvnL4RC9yR0LIw"] .zpimage-container figure img { width:723px ; height:309.99px ; } } @media (max-width: 767px) { [data-element-id="elm_a1CmfiNpzvnL4RC9yR0LIw"] .zpimage-container figure img { width:415px ; height:177.93px ; } } [data-element-id="elm_a1CmfiNpzvnL4RC9yR0LIw"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="center" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-center zpimage-size-fit zpimage-tablet-fallback-fit zpimage-mobile-fallback-fit hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/Screenshot%202023-08-12%20at%2010.07.55%20am.png" width="415" height="177.93" loading="lazy" size="fit" alt="Extracting a lexicon that relates words to their meanings in each dataset" data-lightbox="true"/></picture></span></figure></div>
</div><div data-element-id="elm_nwsHHNOQTGmo-IdYY3B47w" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_nwsHHNOQTGmo-IdYY3B47w"].zpelem-text { border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p>A major challenge in artificial intelligence is improving computers' ability to truly comprehend language. Humans readily grasp how the meaning of a sentence depends on the meanings of its component words and how they combine structurally. We intuitively rearrange language components while preserving overall meaning.</p><p><br></p><p>AI systems still struggle with this fluid, compositional reasoning. Mastering it would make conversational AI much more powerful and useful. For example, chatbots could handle varied questions and scenarios if they deeply understood how permutations of known linguistic elements construct meaning.</p><p><br></p><p>To advance AI capabilities in this area, researchers at MIT and IBM recently developed a novel technique called LEXSYM. Their key insight is that compositionality mathematically correlates with symmetries in how language data can be transformed while staying semantically valid.</p><p><br></p><p>For instance, swapping &quot;yellow&quot; and &quot;green&quot; in the sentence &quot;Pick up the yellow cube&quot; maintains its essential meaning. LEXSYM automatically detects such symmetries and uses them to synthesize new training examples by substituting related words and phrases.</p><p><br></p><p>In experiments, neural networks trained with LEXSYM-augmented data showed improved skills in executing new instruction combinations, answering compositional reasoning questions about images, and inferring the logical parse of unfamiliar sentences.</p><p><br></p><p>While limitations remain, LEXSYM provides a promising path toward stronger fluidity, generalization, and human-like compositional abilities in AI systems. As conversational interfaces proliferate, these skills will allow smooth, robust interactions.</p><p><br></p><p>For businesses leveraging AI, enhanced compositional language mastery can significantly increase the capability, utility, and linguistic versatility of chatbots, virtual assistants, recommendation systems, and other applications. LEXSYM offers useful foundations to make these AI agents more conversant, adaptive, and lifelike in communications.</p><div><br>Sources:</div><div><div><span style="color:inherit;"><a href="https://arxiv.org/pdf/2201.12926.pdf" title="LexSym: Compositionality as Lexical Symmetry" rel="">LexSym: Compositionality as Lexical Symmetry</a></span></div></div></div><p></p></div>
</div></div></div></div></div></div> ]]></content:encoded><pubDate>Sat, 12 Aug 2023 10:10:54 +1000</pubDate></item><item><title><![CDATA[DisentQA: Catching Knowledge Gaps and Avoiding Misleading Users]]></title><link>https://www.nownextlater.ai/Insights/post/enabling-ai-to-untangle-different-knowledge-sources</link><description><![CDATA[Building QA Systems that catch knowledge gaps and avoid misleading users.]]></description><content:encoded><![CDATA[<div class="zpcontent-container blogpost-container "><div data-element-id="elm_ewF7pMN9Q_eczUOQpCYtUA" data-element-type="section" class="zpsection "><style type="text/css"></style><div class="zpcontainer-fluid zpcontainer"><div data-element-id="elm_XdQfIANyTi-5Z3w2LSGv-A" data-element-type="row" class="zprow zprow-container zpalign-items- zpjustify-content- " data-equal-column=""><style type="text/css"></style><div data-element-id="elm_tGWqJgjLSlyldj1XkXMcGw" data-element-type="column" class="zpelem-col zpcol-12 zpcol-md-12 zpcol-sm-12 zpalign-self- "><style type="text/css"></style><div data-element-id="elm_KipIDvLOVMb6oIC8bF9TkA" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_KipIDvLOVMb6oIC8bF9TkA"] .zpimage-container figure img { width: 500px ; height: 486.01px ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_KipIDvLOVMb6oIC8bF9TkA"] .zpimage-container figure img { width:500px ; height:486.01px ; } } @media (max-width: 767px) { [data-element-id="elm_KipIDvLOVMb6oIC8bF9TkA"] .zpimage-container figure img { width:500px ; height:486.01px ; } } [data-element-id="elm_KipIDvLOVMb6oIC8bF9TkA"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="center" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-center zpimage-size-medium zpimage-tablet-fallback-medium zpimage-mobile-fallback-medium hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/Screenshot%202023-08-12%20at%209.09.37%20am.png" width="500" height="486.01" loading="lazy" size="medium" alt="Example outputs from our disentangled QA model on the Natural Questions dataset. " data-lightbox="true"/></picture></span></figure></div>
</div><div data-element-id="elm_TNbKqQ17TP256B60EqRP7w" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_TNbKqQ17TP256B60EqRP7w"].zpelem-text { border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><div style="color:inherit;"><p>Imagine you ask your phone &quot;Who wrote the song Hello by Adele?&quot; and it gives you an incorrect answer, insisting the song is by Taylor Swift. This shows artificial intelligence sometimes confuses its own training knowledge with external facts.</p><p><br></p><p>Researchers want to fix this issue to make AI assistants more helpful and honest. Their solution: <span style="color:inherit;">Build QA Systems that catch knowledge gaps and avoid misleading users by </span>teaching the system to provide two responses:</p><ol><li>The factual answer based on given information (e.g. Adele)</li><li>What it privately recalls from its memory (e.g. Taylor Swift)</li></ol><p><br></p><p>This highlights any mismatches between its training knowledge and external data. It's like when we say &quot;Hmm, I thought X, but the website says Y.&quot;</p><p><br></p><p>The team trained the AI model by creating quizzes with tricky examples:</p><ul><li>Swapping names in passages to elicit different responses from the context vs. the model's recollection</li><li>Removing passages altogether so the system must say &quot;I don't know&quot;</li></ul><p><br></p><p>After this special training, the model reliably distinguished its own knowledge from given facts. This improved its accuracy and truthfulness.</p><p><br></p><p>Say you ask about a movie release date. The system can now respond:</p><p><span style="font-style:italic;">&quot;The article says July 2022. But I thought it was December 2022.&quot;</span></p><p><br></p><p>This catches any knowledge gaps and avoids misleading users.</p><p><br></p><p>While not perfect, it's major progress toward AI that collaborates in a transparent, helpful manner. The benefits for businesses are clear:</p><ul><li>Avoid frustrated users with incorrect responses</li><li>Build trust by exposing limitations upfront</li><li>Reduce risk from applying flawed knowledge</li><li>Clarify when external data should override internal beliefs</li></ul><p><br></p><p>By recognizing and sharing when its knowledge is incomplete, the AI becomes a more reliable and honest partner. This research brings us closer to truly cooperative human-AI interaction.</p><p><br></p><p>Sources:</p><p><span style="color:inherit;"><a href="https://arxiv.org/pdf/2211.05655.pdf" title="DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering" rel="">DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering</a></span></p><p></p></div>
</div><p></p></div></div></div></div></div></div></div> ]]></content:encoded><pubDate>Sat, 12 Aug 2023 09:22:46 +1000</pubDate></item><item><title><![CDATA[Training Smarter AI Systems to Understand Natural Language]]></title><link>https://www.nownextlater.ai/Insights/post/Training-Smarter-AI-Systems-to-Understand-Natural-Language</link><description><![CDATA[Researchers are exploring new techniques to improve AI's ability to grasp diverse sentence structures and indirect meaning.]]></description><content:encoded><![CDATA[<div class="zpcontent-container blogpost-container "><div data-element-id="elm_1YarWTKxSpWFcYT1yEypiQ" data-element-type="section" class="zpsection "><style type="text/css"></style><div class="zpcontainer-fluid zpcontainer"><div data-element-id="elm__r-n6p0FTsCU2VN0Qht7Yw" data-element-type="row" class="zprow zprow-container zpalign-items- zpjustify-content- " data-equal-column=""><style type="text/css"></style><div data-element-id="elm_N-CnlBB4S6GtTyMuX_7gIA" data-element-type="column" class="zpelem-col zpcol-12 zpcol-md-12 zpcol-sm-12 zpalign-self- "><style type="text/css"></style><div data-element-id="elm_k92LbwDhYZdUrScfGjwNLA" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_k92LbwDhYZdUrScfGjwNLA"] .zpimage-container figure img { width: 800px ; height: 325.50px ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_k92LbwDhYZdUrScfGjwNLA"] .zpimage-container figure img { width:500px ; height:203.44px ; } } @media (max-width: 767px) { [data-element-id="elm_k92LbwDhYZdUrScfGjwNLA"] .zpimage-container figure img { width:500px ; height:203.44px ; } } [data-element-id="elm_k92LbwDhYZdUrScfGjwNLA"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="center" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-center zpimage-size-large zpimage-tablet-fallback-large zpimage-mobile-fallback-large hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/Screenshot%202023-08-12%20at%208.43.27%20am.png" width="500" height="203.44" loading="lazy" size="large" alt="The overall framework to construct PARAAMR based on AMR back-translation. " data-lightbox="true"/></picture></span></figure></div>
</div><div data-element-id="elm_R3yIVftWS3ezwM-jxnW1Uw" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_R3yIVftWS3ezwM-jxnW1Uw"].zpelem-text { border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p style="text-align:left;">Artificial intelligence has come a long way in understanding human language, but it still struggles with the nuances and complexities of natural conversation. Researchers are exploring new techniques to improve AI's ability to grasp diverse sentence structures and indirect meaning.</p><p style="text-align:left;"><br></p><p>A team at Google, UCLA and USC recently made advances on this challenge by creating a large dataset of syntactically diverse sentence pairs with similar meaning. Their method relies on abstract meaning representations (AMRs).</p><p><br></p><p>AMRs capture the underlying semantics of sentences in a structured graph format. While two sentences can differ significantly in wording and syntax, their AMRs may convey largely the same meaning.</p><p><br></p><p>The researchers leveraged this insight for paraphrasing - generating sentences that communicate the same essence differently. First, they parsed over 15 million sentences into AMR graphs using an existing tool. Next, they systematically modified each graph's &quot;focus&quot; node and direction of connecting edges to reflect alternate ways of expressing the main idea.</p><p><br></p><p>The altered AMR graphs were then decoded back into English sentences. This yielded over 100 million novel paraphrases exhibiting substantial syntactic diversity like changes in word order, structure and focus.</p><p><br></p><p>Through both automatic metrics and human evaluation, the team showed their new corpus called PARAAMR has greater diversity than other popular paraphrasing datasets based on machine translation, while maintaining semantic similarity.</p><p><br></p><p>Unlike translating between languages, the AMR approach reliably preserves meaning without introducing errors. And forcing syntactic variations during decoding prompts more creative expression of ideas.</p><p><br></p><p>The researchers demonstrated PARAAMR's value on three NLP tasks. Using it to train systems for learning sentence embeddings, controlling paraphrase syntax, and low-shot text classification all led to improved performance over other datasets.</p><p><br></p><p>For businesses applying AI, better representing language semantics in machine learning models enables more natural interactions. Conversational systems like chatbots and voice assistants can understand users more precisely without strictly expecting fixed phrases and patterns.</p><p><br></p><p>PARAAMR shows the possibilities of graph-based semantic parsing for AI language understanding. But some limitations remain for real-world deployment:</p><ul><li>Performance depends heavily on upstream parsing and graph-to-text modules. Imperfect components propagate errors.</li><li>Many graph modifications yield unnatural outputs. The team filtered these, but some issues may remain.</li><li>Their English-only approach lacks linguistic and cultural diversity to cover all use cases.</li></ul><p><br></p><p>With smart engineering and expanded training data, AMR-based methods can make conversational AI more flexible and robust. By better grasping nuanced human language, systems can communicate more naturally across diverse applications.</p><p><br></p><p>Sources:</p><p><span style="color:inherit;"><a href="https://arxiv.org/pdf/2305.16585.pdf" title="ParaAMR: A Large-Scale Syntactically Diverse Paraphrase Dataset by AMR Back-Translation" rel="">ParaAMR: A Large-Scale Syntactically Diverse Paraphrase Dataset by AMR Back-Translation</a></span></p><p></p></div><p></p></div>
</div></div></div></div></div></div> ]]></content:encoded><pubDate>Sat, 12 Aug 2023 08:46:52 +1000</pubDate></item><item><title><![CDATA[Making Conversational AI More Natural: Helping Systems Understand Indirect References]]></title><link>https://www.nownextlater.ai/Insights/post/making-conversational-ai-more-natural-helping-systems-understand-indirect-references</link><description><![CDATA[Making Conversational AI More Natural: Helping Systems Understand Indirect References]]></description><content:encoded><![CDATA[<div class="zpcontent-container blogpost-container "><div data-element-id="elm_5TfRZxwRT3CFPbDWKN0bKA" data-element-type="section" class="zpsection "><style type="text/css"></style><div class="zpcontainer-fluid zpcontainer"><div data-element-id="elm_oA957902T6Wqc-GfeFtp0g" data-element-type="row" class="zprow zprow-container zpalign-items- zpjustify-content- " data-equal-column=""><style type="text/css"></style><div data-element-id="elm_FyZN0JCMREeIVQPLegkUhw" data-element-type="column" class="zpelem-col zpcol-12 zpcol-md-12 zpcol-sm-12 zpalign-self- "><style type="text/css"></style><div data-element-id="elm_J_ikcM4Ft-ulWjirJHXomg" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_J_ikcM4Ft-ulWjirJHXomg"] .zpimage-container figure img { width: 500px ; height: 341.79px ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_J_ikcM4Ft-ulWjirJHXomg"] .zpimage-container figure img { width:500px ; height:341.79px ; } } @media (max-width: 767px) { [data-element-id="elm_J_ikcM4Ft-ulWjirJHXomg"] .zpimage-container figure img { width:500px ; height:341.79px ; } } [data-element-id="elm_J_ikcM4Ft-ulWjirJHXomg"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="center" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-center zpimage-size-medium zpimage-tablet-fallback-medium zpimage-mobile-fallback-medium hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/Screenshot%202023-08-12%20at%208.15.29%20am.png" width="500" height="341.79" loading="lazy" size="medium" alt="Annotators were shown a cartoon in which they were asked to complete the final step of a conversation." data-lightbox="true"/></picture></span></figure></div>
</div><div data-element-id="elm_N4kn64LYvsu2o4FYmzMuEg" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_N4kn64LYvsu2o4FYmzMuEg"] .zpimage-container figure img { width: 200px ; height: 143.04px ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_N4kn64LYvsu2o4FYmzMuEg"] .zpimage-container figure img { width:200px ; height:143.04px ; } } @media (max-width: 767px) { [data-element-id="elm_N4kn64LYvsu2o4FYmzMuEg"] .zpimage-container figure img { width:200px ; height:143.04px ; } } [data-element-id="elm_N4kn64LYvsu2o4FYmzMuEg"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="center" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-center zpimage-size-small zpimage-tablet-fallback-small zpimage-mobile-fallback-small hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/Screenshot%202023-08-12%20at%208.15.03%20am.png" width="200" height="143.04" loading="lazy" size="small" alt="Actions annotators were encouraged (Do) or discouraged (Don’t) to take for the BOOKS domain." data-lightbox="true"/></picture></span></figure></div>
</div><div data-element-id="elm_NbHzOM2LTFS1HARt5M9q8g" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_NbHzOM2LTFS1HARt5M9q8g"].zpelem-text { border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p>Artificial intelligence (AI) has made great strides in recent years, with systems able to hold conversations, answer questions, and make recommendations. However, these systems still struggle with the subtle complexities of natural human language. In particular, when people are choosing between options, they often refer indirectly to their choice rather than using the exact name. For example, when asked &quot;Do you want the chocolate or vanilla ice cream?&quot; someone may respond &quot;I'll have the darker one&quot; rather than saying &quot;chocolate.&quot; Teaching AI systems to understand such indirect references is an important next step to make interactions feel more natural.</p><p><br></p><p>Researchers at Google have developed a new dataset and models to tackle this problem, summarized in a recent paper. Their key innovation was creating a cartoon-style interface to collect natural conversational responses from regular people choosing between two options, such as recipes, books or songs. By framing it as a casual chat between friends looking back on options, they encouraged indirect references like &quot;the one with the green cover&quot; or &quot;the sweeter dessert&quot; rather than using item names directly.</p><p><br></p><p>After collecting a dataset of over 40,000 such indirect references across three categories, they tested different AI models at picking the intended option based on the reference. With no background knowledge beyond the item names, accuracy was just above random guessing. But given relevant textual descriptions of each item, accuracy reached over 80% with the best models. This is promising compared to previous results, but still leaves room for improvement to handle more subtle references.</p><p><br></p><p>The researchers also showed the models can learn general patterns that transfer between categories, rather than just memorizing item-specific clues. So training on books, songs and recipes enabled reasonably good performance on each area without needing new training data. This is important for applying the technology efficiently to new domains.</p><p><br></p><p>For business leaders, this research highlights both the progress and remaining challenges in making AI conversational interfaces feel natural. Indirect references are common in human conversations, so handling them well is key to users' comfort with AI systems. These results suggest current AI capabilities could support basic back-and-forth interactions, but with some limitations.</p><p><br></p><p>Looking ahead, there are several opportunities to build on this work:</p><ul><li>Expanding training data to cover more domains, languages and cultural references would make systems more robust.</li><li>Exploring different input modes beyond text, like images, audio and video, could improve understanding of indirect references.</li><li>Better reasoning capabilities would allow AI systems to make inferences about items, rather than relying completely on background knowledge descriptions.</li><li>Retrieval augmented models that proactively gather relevant information could improve disambiguation with limited initial knowledge.</li><li>Decomposing complex references into simpler concepts could enable understanding of indirect comparisons like &quot;the happier song.&quot;</li></ul><p><br></p><p>As conversational systems become integrated into more products and workflows, demand will grow for smooth and natural interactions. Investing in AI advances that unlock more human-like language understanding seems likely to offer strategic value across many industries. While current capabilities are promising, there is still plenty of work needed to truly reach the subtlety and flexibility of human conversation.</p><p><br></p><p>Sources</p><p><span style="color:inherit;"><a href="https://arxiv.org/pdf/2212.10933.pdf" title="Resolving Indirect Referring Expressions for Entity Selection" rel="">Resolving Indirect Referring Expressions for Entity Selection</a></span></p><p></p></div><p></p></div>
</div></div></div></div></div></div> ]]></content:encoded><pubDate>Sat, 12 Aug 2023 08:22:55 +1000</pubDate></item></channel></rss>