LLMs Struggle with Misinformation, Shine as Human Aids

New research reveals large language models' limitations in fact-checking Chinese misinformation, but highlights their potential for human assistance.

A new benchmark called CANDY evaluates large language models (LLMs) in fact-checking Chinese misinformation. The study finds LLMs alone are unreliable for generating accurate conclusions. However, they show significant promise as assistive tools for human fact-checkers.

By Mark Ellison

September 10, 2025

4 min read

LLMs Struggle with Misinformation, Shine as Human Aids

Key Facts

CANDY is a new benchmark for evaluating LLMs in fact-checking Chinese misinformation.
The dataset contains approximately 20,000 carefully annotated instances.
Current LLMs show limitations in generating accurate fact-checking conclusions independently.
Factual fabrication was identified as the most common failure mode for LLMs.
LLMs demonstrate considerable potential as assistive tools to augment human fact-checking performance.

Why You Care

Do you trust AI to tell you what’s true? As large language models (LLMs) become more common, their role in combating misinformation is a hot topic. But how good are they really? A new study reveals a surprising truth about their current capabilities. This research, detailed in a paper presented at EMNLP 2025, directly impacts how you might use AI for essential information. Understanding these findings is crucial for anyone navigating the digital landscape.

What Actually Happened

Researchers have introduced CANDY, a new benchmark for evaluating LLMs. This benchmark specifically assesses their ability to fact-check Chinese misinformation, according to the announcement. The team curated a substantial dataset for this purpose. It includes approximately 20,000 carefully annotated instances of misinformation. This allows for a systematic evaluation of various LLM capabilities. The study aimed to clarify the effectiveness of LLMs in this complex domain. It also sought to understand their inherent limitations.

The research shows that LLMs face significant challenges. They struggle to produce accurate fact-checking conclusions on their own. This holds true even when using techniques. These techniques include chain-of-thought reasoning and few-shot prompting. The paper states that these methods did not sufficiently improve accuracy. The findings highlight a essential gap in current LLM performance. They indicate that relying solely on AI for fact-checking is not yet viable.

Why This Matters to You

This research offers practical insights for anyone dealing with online information. Imagine you’re trying to verify a news story you saw online. Could an LLM give you a definitive answer? The study suggests caution. While LLMs are , their independent fact-checking capabilities are limited. This means your essential thinking remains essential.

The team revealed a key finding about LLM errors. They developed a taxonomy to categorize flawed explanations generated by LLMs. This helps to pinpoint where the models go wrong. The most common failure mode identified was factual fabrication. This means the LLMs often invent information to support their conclusions. This is a significant concern for accuracy. It highlights the need for human oversight.

LLM Fact-Checking Modes

Generating Conclusions Alone

Assisting Human Fact-Checkers

Despite these limitations, the research indicates a promising future. LLMs show considerable potential as assistive tools. They can augment human performance in fact-checking scenarios. Think of it as a assistant, not a replacement. One of the authors, Ruiling Guo, noted the dual nature of their findings. “Although LLMs alone are unreliable for fact-checking,” Guo stated, “our findings indicate their considerable potential to augment human performance when deployed as assistive tools in scenarios.” How might this change your approach to verifying information online?

The Surprising Finding

Here’s the twist: While LLMs struggle to be fully autonomous fact-checkers, they excel as human aids. This challenges the common assumption that AI must be to be useful. The study finds that LLMs alone are unreliable for fact-checking. This is even with prompting techniques. However, their role as assistive tools is significant. The team revealed that the most common error was factual fabrication. This means LLMs often create false information when attempting to fact-check. This is a counterintuitive result for many. You might expect an AI to simply state ‘unknown’ if it lacks information. Instead, it fabricates.

This finding is surprising because it shifts the focus. It moves from AI replacing humans to AI empowering humans. It suggests that the future of fact-checking isn’t just about better AI models. It’s about how humans and AI collaborate. The research underscores the importance of human judgment. It also highlights the need for human-in-the-loop systems. This approach can mitigate the risks of factual fabrication.

What Happens Next

This research points towards a future of collaborative intelligence. Expect to see more tools emerging around late 2025 or early 2026. These tools will integrate LLMs specifically for human assistance. For example, imagine a browser plugin. It could flag potential misinformation and provide initial research points. This would not give a final verdict. Instead, it would offer starting points for your own investigation. This could significantly speed up your research process.

The industry implications are clear. Developers should focus on building LLM applications. These applications should prioritize augmentation over automation. This means designing interfaces that allow human users to easily verify AI-generated information. Actionable advice for you: be wary of any AI tool claiming to be a definitive fact-checker. Instead, look for tools that support your own research. The paper states that their dataset and code are accessible. This will allow further research and creation in this crucial area. This collaborative approach will be key to combating misinformation effectively.

Ready to start creating?