LLMs Can Conceal Text Within Text, Raising AI Safety Concerns

New research reveals how Large Language Models can embed secret messages in seemingly innocent content.

A recent paper by Antonio Norelli and Michael Bronstein demonstrates a protocol allowing Large Language Models (LLMs) to hide one coherent text inside another of the same length. This capability, achievable even with smaller open-source LLMs, raises significant questions about trust in digital communication and AI safety.

By Mark Ellison

October 28, 2025

3 min read

LLMs Can Conceal Text Within Text, Raising AI Safety Concerns

Key Facts

LLMs can hide one coherent text inside another, different text of the same length.
Even modest 8-billion-parameter open-source LLMs can achieve high-quality results.
Encoding and decoding a message the length of an abstract can be done locally in seconds.
This capability demonstrates a radical decoupling of text from authorial intent.
The research raises urgent questions for AI safety and trust in written communication.

Why You Care

Imagine reading a product review that seems perfectly normal. What if it secretly contained a hidden manuscript? This isn’t science fiction anymore. New research shows that Large Language Models (LLMs) can embed one complete text within another. This creation radically changes how we might view digital communication. It challenges your trust in online content. How will this impact the information you consume daily?

What Actually Happened

Antonio Norelli and Michael Bronstein have unveiled a new protocol, according to the announcement. Their paper, titled “LLMs can hide text in other text of the same length,” details this unsettling capability. It explains how a meaningful text can be concealed within a completely different, yet coherent, text. Both texts maintain the same length. This process is surprisingly efficient. Even modest 8-billion-parameter open-source LLMs can achieve high-quality results, the research shows. A message as long as an abstract can be encoded and decoded. This happens locally on a laptop in mere seconds, the team revealed. This capability demonstrates a radical decoupling of text from authorial intent.

Why This Matters to You

This new method has practical implications for you. It could affect how you perceive online interactions. Think of it as a digital form of steganography, but for entire messages. This capability further erodes trust in written communication, as mentioned in the release. Trust is already shaken by the rise of LLM chatbots. For example, a company could covertly deploy an unfiltered LLM. It could encode its answers within the compliant responses of a safe model. This raises important questions for AI safety.

Potential Scenarios for Hidden Text

Political Messaging: A tweet praising a leader could hide a harsh critique.
Product Reviews: An ordinary review might conceal a secret message or document.
Corporate Communications: Safe, public statements could embed internal, unfiltered directives.
Malicious Content: Harmful instructions could be hidden within benign-looking text.

What if the next email you read contains a hidden message? This system challenges our understanding of what an LLM truly ‘knows.’ It also questions its intent behind generated text. Your ability to discern genuine information from concealed content becomes more difficult.

The Surprising Finding

The most surprising aspect of this research is the ease and accessibility of this technique. The paper states that even “modest 8-billion-parameter open-source LLMs are sufficient to obtain high-quality results.” This challenges the assumption that such text manipulation would require massive, proprietary models. It means that this capability is not confined to labs. Instead, it is within reach for many users. This finding is particularly unsettling because it broadens the potential for misuse. It also highlights the need for countermeasures. The fact that encoding and decoding can happen “locally on a laptop in seconds” further underscores its practical viability. This indicates a low barrier to entry for anyone wishing to use this method.

What Happens Next

This discovery will likely prompt action in the AI safety community. Researchers will focus on detection methods in the coming months. We might see new tools emerge by early to mid-2026. These tools could identify hidden texts. What’s more, regulatory bodies may consider new guidelines for LLM creation. This could happen within the next 12-18 months. The industry implications are significant. It challenges our understanding of what it means for an LLM to know something, the documentation indicates. For example, imagine a content moderation system. It would need to evolve significantly to detect these embedded messages. Our actionable advice for readers is to remain vigilant. Critically evaluate the source and context of online information. The paper concludes that this possibility “raises important questions for AI safety.”

Ready to start creating?