Voice Queries Boost AI 'Hallucinations' by up to 20%

New research reveals spoken commands increase error rates in multimodal LLMs, especially with background noise.

A recent study highlights a significant challenge for voice-driven AI: spoken queries dramatically increase 'hallucinations' in multimodal LLMs. Error rates jump by up to 20% under noisy conditions, suggesting a need for more robust voice interface systems.

Mark Ellison

By Mark Ellison

October 14, 2025

4 min read

Voice Queries Boost AI 'Hallucinations' by up to 20%

Key Facts

  • Spoken queries increase 'hallucinations' in multimodal LLMs.
  • Error rates rise by 3% with clean speech input compared to text.
  • Environmental noise can boost error rates by up to 20%.
  • Input order and query length also affect AI robustness with voice.
  • Existing mitigation strategies like prompting are only partially effective.

Why You Care

Ever asked your smart assistant a question, only for it to give a completely nonsensical answer? What if your voice itself was part of the problem? New research indicates that spoken queries can make AI models “hallucinate” more often. This isn’t just a minor glitch; it directly impacts the reliability of voice-driven interfaces you use daily.

This finding is crucial for anyone interacting with AI through speech. It means your voice commands might be less reliable than text inputs. Understanding this issue helps us anticipate future AI improvements and challenges in voice system.

What Actually Happened

Researchers investigated how spoken input affects errors in multimodal large language models (LLMs). These are AI systems that process multiple types of data, like images and sound. The team developed RePOPE-Spk, an audio-augmented benchmark, as detailed in the blog post. This benchmark extends existing tools to test AI models with spoken queries under various acoustic conditions.

The study systematically evaluated both proprietary and open-source models. The goal was to pinpoint how voice commands influence AI accuracy. Technical terms like “multimodal LLMs” refer to AI that combines different data types. “Hallucinations” describe instances where an AI generates incorrect or fabricated information.

Why This Matters to You

This research reveals a essential vulnerability in how AI processes voice commands. If you rely on voice assistants or plan to use more voice-driven AI, this directly affects your experience. Imagine trying to get accurate information from an AI, but your spoken words introduce errors.

For example, think about using voice commands in your smart home. If background noise causes your AI to misunderstand a essential instruction, it could lead to frustrating or even problematic outcomes. The study finds that “hallucinations escalate when queries are spoken rather than written.” This means your voice might be making AI less precise.

Here’s a breakdown of the observed error increases:

  • Clean Speech: Error rates increase by 3%.
  • Environmental Noise: Error rates increase by up to 20%.

Do you ever find yourself repeating commands to your voice assistant? This research suggests why. Input order and query length also impact robustness, according to the announcement. This means how you phrase your questions matters.

Strategies like “many-shot prompting” (giving the AI multiple examples) and “chain-of-thought reasoning” (guiding the AI through logical steps) offer only partial mitigation. This highlights the deep-seated nature of the problem, the research shows.

The Surprising Finding

Here’s the twist: while we often assume AI handles speech well, the study indicates a significant drop in reliability. Hallucinations are much more prevalent with spoken queries. This happens even under seemingly ideal “clean speech” conditions, according to the paper states. The error rates jump by 3% in these scenarios.

However, the real shock comes with environmental noise. The team revealed that error rates can increase by up to 20% when queries are spoken amid background sounds. This challenges the common assumption that modern AI is enough to handle everyday acoustic environments. It suggests that our noisy world poses a serious hurdle for voice AI.

This finding is particularly surprising given the rapid advancements in speech recognition. We expect AI to filter out distractions. Yet, the presence of environmental noise significantly degrades the performance of multimodal LLMs. This creates a essential and underexplored challenge, the study finds.

What Happens Next

This research opens new directions for building more reliable voice interface systems. We can expect AI developers to focus heavily on improving speech processing for multimodal LLMs. Over the next 12-18 months, look for new models specifically designed to combat these voice-induced hallucinations.

For example, future voice assistants might incorporate noise cancellation features directly into their AI. This would help them better interpret your commands in noisy environments. The industry implications are substantial, driving creation in voice AI accuracy.

Actionable advice for you: be mindful of your environment when using voice commands. If accuracy is essential, consider typing your query instead. Developers will likely explore new training methodologies and architectural changes. This will make AI more resilient to diverse acoustic conditions, the technical report explains.

Ready to start creating?

Create Voiceover

Transcribe Speech

Create Dialogues

Create Visuals

Clone a Voice