Rethinking LLM Hallucinations: A New Path to Factuality

New research challenges conventional wisdom on why Large Language Models 'make things up' and proposes a novel solution.

A recently withdrawn paper, 'Banishing LLM Hallucinations Requires Rethinking Generalization,' by Johnny Li and a team of researchers, explored the root causes of Large Language Model (LLM) hallucinations. It suggests traditional mitigation methods fall short and introduces a new model, Lamini-1, designed to store facts in a 'Mixture of Memory Experts' to combat factual errors. This work, though withdrawn, offers critical insights into the ongoing quest for more reliable AI.

By Katie Rowan

September 14, 2025

5 min read

Rethinking LLM Hallucinations: A New Path to Factuality

Key Facts

The paper "Banishing LLM Hallucinations Requires Rethinking Generalization" was submitted to arXiv by Johnny Li and a team of 11 authors.
It challenges the conventional wisdom that LLM hallucinations are a balance between creativity and factuality or easily mitigated by external knowledge.
Extensive experiments showed traditional approaches fail to explain practical hallucinations.
LLMs augmented with a Mixture of Memory Experts (MoME) could memorize large datasets of random numbers.
The findings led to the design of Lamini-1, a model intended to remove hallucinations by storing facts in millions of memory experts.

Why You Care

Ever asked an AI a question only to receive a confidently incorrect answer? It’s frustrating, right? This phenomenon, known as LLM hallucination, plagues even the most Large Language Models. What if the very way we understand why AI ‘makes things up’ is fundamentally flawed? A recent paper, though withdrawn, offers a provocative new perspective on this persistent problem. This could dramatically change how you interact with AI tools in the future.

What Actually Happened

A paper titled “Banishing LLM Hallucinations Requires Rethinking Generalization” was submitted to arXiv by Johnny Li and a team of eleven other authors. The paper, identified as arXiv:2406.17642, explored the persistent issue of hallucinations in Large Language Models (LLMs). These are AI models known for their chat, coding, and reasoning abilities. However, they frequently generate incorrect or nonsensical information, according to the announcement. The research challenges the conventional belief that hallucinations stem from a trade-off between creativity and factuality. It also questions that they can be simply mitigated by grounding LLMs in external knowledge sources. The team, including Gregory Diamos, performed extensive systematic experiments. They showed that these traditional approaches do not fully explain why LLMs hallucinate in practice. The paper was later withdrawn by Gregory Diamos, with a note indicating a desire to revisit some experiments, specifically figure 5.

Why This Matters to You

Imagine you’re relying on an AI for essential information, perhaps for a school project or a business report. How confident are you in its factual accuracy? This research suggests that simply feeding more data to an LLM or connecting it to external databases might not be enough to stop it from inventing facts. The study finds that LLMs augmented with a massive Mixture of Memory Experts (MoME) could easily memorize large datasets of random numbers. This indicates a deeper issue than just a lack of information.

What does this mean for your daily use of AI? It implies that the current methods for making AI more truthful might be incomplete. The paper’s authors revealed a theoretical construction. It showed that simple neural networks trained to predict the next token hallucinate above a certain training loss threshold. This threshold is typically exceeded when training on internet-scale data. This suggests that the very process of training LLMs on vast amounts of internet data might inherently lead to hallucinations.

Key Findings from the Research:

Traditional approaches fail to explain hallucinations.
LLMs can memorize random data, suggesting a different problem.
Hallucinations occur when training loss exceeds a threshold.
Lamini-1 model designed to remove hallucinations using memory experts.

One of the authors stated, “Despite their chat, coding, and reasoning abilities, Large Language Models (LLMs) frequently hallucinate.” This highlights the central problem they aimed to address. Think of it like this: if your car’s engine has a fundamental design flaw, simply adding more fuel won’t fix it. This research proposes a fundamental re-evaluation of the LLM ‘engine.’

The Surprising Finding

Here’s the twist: conventional wisdom often attributes LLM hallucinations to a balance between creativity and factuality. It also suggests they can be mitigated by external knowledge sources, as mentioned in the release. However, the team’s extensive systematic experiments challenged this idea. The research shows that these traditional approaches fail to explain the practical reasons for hallucinations. This is quite surprising because it goes against widely held beliefs in the AI community. What’s more, the technical report explains that LLMs augmented with a massive Mixture of Memory Experts (MoME) can easily memorize large datasets of random numbers. This suggests that the issue isn’t just about accessing facts. It’s about how the model processes and stores information. The paper’s theoretical construction further corroborates these experimental findings. It indicates that even simple neural networks will hallucinate when training loss is too high. This happens frequently when training on internet-scale data, according to the announcement. This finding challenges the assumption that more data or better retrieval alone can solve the problem.

What Happens Next

Even though the paper was withdrawn, its insights are significant for the future of AI. The team used their findings to design a first-generation model called Lamini-1. This model aims to remove hallucinations by storing facts in a massive mixture of millions of memory experts. These experts are retrieved dynamically, the company reports. This suggests a shift towards more structured, memory-based approaches for factual recall. For example, imagine a future AI assistant that doesn’t just ‘guess’ answers. Instead, it consults a vast, specialized internal library for every piece of information. This could lead to a new generation of more reliable AI tools. While specific timelines are uncertain due to the paper’s withdrawal, the underlying concepts could influence research in the next 12-18 months. Developers might start exploring MoME-like architectures. As mentioned in the release, this work could inspire new methods for improving the factual accuracy of LLMs. What steps will you take to verify information provided by AI tools in the coming months?

Ready to start creating?