Multilingual LLMs Can Spread Misinformation Across Languages

New research reveals how fake information in one language can compromise AI reliability globally.

A recent paper by Taiming Lu and Philipp Koehn highlights a critical flaw in multilingual large language models (LLMs). Misinformation introduced in one language can spread to others, making standard unlearning techniques insufficient. Effective mitigation requires addressing harmful content in both English and its original language.

By Mark Ellison

September 14, 2025

4 min read

Multilingual LLMs Can Spread Misinformation Across Languages

Key Facts

Misinformation in multilingual LLMs can spread across different languages.
Standard unlearning techniques focused on English are insufficient to mitigate this spread.
These standard methods can inadvertently reinforce harmful content across languages.
Effective unlearning requires addressing harmful responses in both English and the original language of the harmful data.
The research will be presented at the EMNLP 2025 Main Conference.

Why You Care

Ever wondered if the AI you trust for information might be silently spreading fake news across different languages? This isn’t just a hypothetical concern. A new study reveals a significant challenge for multilingual large language models (LLMs).

This research shows how misinformation, once learned in one language, can contaminate the model’s responses in other languages. This directly impacts the reliability of the AI tools you use daily. Do you rely on AI for translation or global content creation?

What Actually Happened

Researchers Taiming Lu and Philipp Koehn recently investigated how harmful information propagates within multilingual LLMs. According to the announcement, their paper evaluates various unlearning methods. They demonstrated that fake information, regardless of its original language, can spread across different languages. This happens once it’s introduced into these models through training data, compromising the integrity of generated content, the study finds.

The team revealed that standard unlearning techniques, which often focus only on English data, are not enough. These methods fail to stop the spread of harmful content in multilingual contexts. In fact, they could even reinforce harmful content across languages, the paper states. This highlights a crucial gap in current AI safety protocols.

Why This Matters to You

This finding has practical implications for anyone interacting with or developing AI. Imagine you’re a content creator using an LLM to generate articles in multiple languages. If the model was trained on a piece of misinformation in German, it might then generate similar false content in Spanish or French. This happens even if the misinformation was never explicitly in the Spanish or French training data.

What’s more, the research underscores the need for comprehensive unlearning strategies. These strategies must consider the multilingual nature of modern LLMs. This will enhance their safety and reliability across diverse linguistic landscapes, as detailed in the blog post.

Key Findings on Unlearning Misinformation:

Standard English-focused methods are insufficient. They fail to prevent spread in multilingual contexts.
Such methods can inadvertently reinforce harmful content. This happens across different languages.
Effective unlearning requires a dual approach. Harmful responses must be addressed in both English and the original language of the harmful data.

How much do you trust the multilingual AI tools you currently use? The team revealed that “only by addressing harmful responses in both English and the original language of the harmful data can we effectively eliminate generations for all languages.” This means a more targeted, language-specific approach is essential for truly cleaning up LLMs.

The Surprising Finding

Here’s the twist: you might assume that if you remove misinformation in English, it’s gone from your AI. However, the research shows this isn’t the case for multilingual models. The study finds that fake information, even if in a non-English language, can still infect English outputs. It works the other way too.

This challenges the common assumption that simply cleaning up the dominant language (English) will solve the problem. The authors state, “Our findings reveal that standard unlearning techniques, which typically focus on English data, are insufficient in mitigating the spread of harmful content in multilingual contexts and could inadvertently reinforce harmful content across languages.” This means a superficial fix can actually make the problem worse. It’s like trying to clean a multi-room house by only cleaning the living room.

What Happens Next

This research, slated for the EMNLP 2025 Main Conference, points to a clear future direction. AI developers will need to implement more unlearning techniques. We can expect to see new methods emerging in the next 12-18 months. These methods will specifically target cross-lingual misinformation.

For example, imagine an AI company like Google or Meta retraining their translation models. They will need to identify and remove harmful data points in every language, not just English. This could involve developing algorithms that trace the origin of misinformation. The documentation indicates that addressing this issue is essential for enhancing AI safety. Your current AI tools might become much more reliable. The industry implications are significant, pushing for a global standard in AI content moderation. The team revealed this underscores “the essential need for comprehensive unlearning strategies that consider the multilingual nature of modern LLMs.”

Ready to start creating?