Evaluating AI Language Models: Formal vs Functional Linguistic Competence

10.08.23 09:53 PM - Comment(s) - By Ines Almeida

In recent years, large language models (LLMs) like OpenAI's GPT-4 have demonstrated an impressive ability to generate human-like text. This has led to claims that LLMs represent a major advance towards true artificial intelligence. However, a new perspective paper argues that we must evaluate LLMs along two dimensions: formal linguistic competence and functional linguistic competence.

Formal Linguistic Competence

Formal linguistic competence refers to knowledge of the syntactic rules and statistical patterns that characterize a language. This includes things like grammar, vocabulary, word order, and linguistic abstractions.

The authors argue that contemporary LLMs have made dramatic progress in acquiring formal linguistic competence. For example, LLMs can now successfully:

Learn and apply complex grammatical rules like subject-verb agreement, even over long distances
Generalize learned grammatical patterns to new words and contexts
Capture nuanced statistics of how words are used together in a language
Generate paragraphs of coherent, human-like text

In many ways, the statistical learning approach used by LLMs resembles how humans acquire language. The authors conclude that as models of core language knowledge, contemporary LLMs are quite scientifically useful.

Functional Linguistic Competence

However, truly using language requires more than formal linguistic competence. Functional competence refers to the cognitive skills needed to comprehend language and reason about its implications in the real world. This relies on capacities like:

Formal reasoning (logic, math, problem solving)
World knowledge (facts about objects, people, events)
Situation modeling (tracking events over time)
Social reasoning (pragmatics, communicative intent)

Unlike formal competence, these skills are not unique to language but draw on domain-general cognitive resources. Evidence shows they rely on distinct neural circuits from core language processing in the human brain.

According to the authors, LLMs still fail substantially in acquiring functional competence. For example, they struggle to:

Perform logical deductions or math beyond basic arithmetic
Provide consistent answers about factual knowledge
Track events over long conversations or stories
Detect sarcasm or successfully tell a joke

In other words, while LLMs generate grammatical text, they cannot actually understand language or reason about its meaning like humans. Their knowledge remains shallow and narrow.

Key Implications

The formal/functional distinction helps explain contradictory claims about LLMs. They succeed at core linguistic tasks but fail at deeper reasoning.

Rather than expecting a single LLM to achieve human-like language use, the authors argue that an AI system needs specialised modules for language, reasoning, world knowledge, etc. This aligns with the modularity of the human mind.

Truly functional, human-level language use likely requires integrating LLMs with models capable of planning, reasoning, and intelligence more broadly.

Overall, LLMs represent an exciting scientific advance in modelling core aspects of human language. But inflated claims of human-like conversational ability remain premature. Understanding language requires more than just statistical learning from linguistic input. Evaluating models along the formal vs functional dimension will produce more realistic assessments moving forward.

Source:

DISSOCIATING LANGUAGE AND THOUGHT IN LARGE LANGUAGE MODELS: A COGNITIVE PERSPECTIVE