AI Agents Combat COVID-19 Misinformation with Fact-Checking

New research introduces SAFE, an AI system using large language models to verify long-form health claims.

A new study reveals SAFE, an AI agent system, significantly improves fact-checking of lengthy COVID-19 misinformation. It combines large language models (LLMs) with retrieval-augmented generation (RAG) to enhance accuracy and reliability. This development offers a scalable solution to the ongoing 'infodemic'.

By Mark Ellison

December 2, 2025

4 min read

AI Agents Combat COVID-19 Misinformation with Fact-Checking

Key Facts

SAFE is an AI agent system for long-form COVID-19 fact-checking.
It combines large language models (LLMs) with retrieval-augmented generation (RAG).
SAFE uses a 130,000-document COVID-19 research corpus for verification.
The system significantly outperformed baseline LLMs in all evaluation metrics.
The simpler LOTR-RAG design was more effective than its SRAG-augmented variant.

Why You Care

Ever scrolled through social media and wondered if that health claim was true? The sheer volume of online misinformation, especially during health crises, is overwhelming. How can we possibly keep up? A new study introduces SAFE, an AI agent system designed to tackle long-form COVID-19 misinformation. This system could dramatically change how we identify and combat false information online. It directly impacts the reliability of the health information you encounter daily.

What Actually Happened

Researchers have developed SAFE (system for accurate fact extraction and evaluation), an AI agent system aimed at fact-checking lengthy COVID-19 articles. This system combines large language models (LLMs) with retrieval-augmented generation (RAG), according to the announcement. The goal is to improve automated fact-checking of complex, long-form misinformation. SAFE operates with two distinct agents. One agent focuses on extracting claims from articles. The second agent then verifies these claims. This verification process uses LOTR-RAG, which taps into a vast corpus of 130,000 COVID-19 research documents. An enhanced version, SAFE (LOTR-RAG + SRAG), further refines retrieval using Self-RAG for query rewriting. The study evaluated these systems on 50 fake news articles, ranging from two to seventeen pages long. These articles contained 246 annotated claims. Public health professionals categorized these claims as true, partly true, false, partly false, or misleading.

Why This Matters to You

Imagine you’re researching a health topic online. You encounter a lengthy article filled with alarming claims. How do you know what to trust? This new SAFE system offers a tool for sifting through such content. The research shows that SAFE systems significantly outperformed baseline LLMs in all evaluation metrics (p < 0.001). This means more accurate and reliable fact-checking for complex information. You can think of it as having a super-powered research assistant. This assistant can quickly analyze dense, potentially misleading content.

Here’s how SAFE performed in key areas:

Metric	SAFE (LOTR-RAG)	SAFE (+SRAG)	Baseline LLM
Consistency	0.629	0.577	0.279
Usefulness	3.640	N/A	N/A
Clearness	3.800	N/A	N/A
Authenticity	3.526	N/A	N/A

Note: Consistency is on a 0-1 scale; subjective metrics (usefulness, clearness, authenticity) are on a 0-4 Likert scale.

For example, if you’re a content creator, this system could help you quickly verify facts before publishing. It could save you hours of manual research. “The core LOTR-RAG design proved more effective than its SRAG-augmented variant, offering a strong foundation for misinformation mitigation,” the team revealed. This suggests a path forward for combating misinformation. What impact could this kind of automated fact-checking have on your daily consumption of news and information?

The Surprising Finding

Here’s an interesting twist: while the researchers developed an enhanced variant, SAFE (LOTR-RAG + SRAG), adding Self-RAG did not always improve performance. In fact, the documentation indicates that adding SRAG slightly reduced overall performance. This was true except for a minor gain in clearness. This finding challenges the assumption that adding more complex components always leads to better results. The simpler LOTR-RAG design, without the SRAG betterment, achieved higher consistency scores. Specifically, SAFE (LOTR-RAG) scored 0.629 for consistency, while SAFE (+SRAG) scored 0.577. This suggests that sometimes, a more streamlined approach is more effective. It highlights the importance of rigorous testing for AI systems. More features do not always equal better performance in the real world.

What Happens Next

This research paves the way for more reliable automated fact-checking tools. We can expect to see further creation and refinement of systems like SAFE in the coming months and quarters. For example, future applications could include integrating these AI agents into social media platforms or news aggregators. This would provide real-time verification of content. The company reports that SAFE demonstrates improvements in long-form COVID-19 fact-checking. It addresses common large language model limitations in consistency and explainability. For content creators and AI enthusiasts, this means potentially more trustworthy information sources. You might soon encounter news feeds with integrated AI fact-checks. This could help you navigate complex topics with greater confidence. The industry implications are significant, offering a approach to the persistent challenge of misinformation. This could lead to a future where verifying information is much faster and more accurate for everyone.

Ready to start creating?