New AI Attack Makes Speech Recognition Slower, Less Accurate

Researchers unveil MORE, a multi-objective adversarial attack targeting both accuracy and efficiency of ASR models.

A new research paper introduces MORE, an advanced adversarial attack on automatic speech recognition (ASR) systems like Whisper. This attack not only degrades transcription accuracy but also significantly increases the computational cost for ASR models. It highlights critical vulnerabilities in widely adopted AI speech technologies.

Sarah Kline

By Sarah Kline

January 6, 2026

4 min read

New AI Attack Makes Speech Recognition Slower, Less Accurate

Key Facts

  • Researchers introduced MORE, a multi-objective adversarial attack on ASR models.
  • MORE degrades both recognition accuracy and inference efficiency of ASR systems.
  • The attack forces ASR models to produce incorrect transcriptions at a substantially higher computational cost.
  • MORE consistently yields significantly longer transcriptions compared to existing baseline attacks.
  • The study addresses a gap in research by exploring ASR robustness with respect to efficiency, not just accuracy.

Why You Care

Have you ever wondered if your voice assistant could be tricked into wasting its own time and resources? A new study reveals a concerning vulnerability in automatic speech recognition (ASR) models, the system behind your smart speakers and transcription services. This isn’t just about wrong words. It’s about making AI work harder for worse results. This directly impacts the reliability and cost-effectiveness of AI voice technologies that you use every day.

What Actually Happened

Researchers have developed a new type of adversarial attack called MORE, which stands for Multi-Objective Repetitive Doubling Encouragement attack. This attack targets large-scale ASR models, such as Whisper, as detailed in the blog post. Unlike previous attacks that focused only on making transcription inaccurate, MORE has a dual purpose. It simultaneously degrades recognition accuracy and inference efficiency. The technical report explains that MORE achieves this through a hierarchical staged repulsion-anchoring mechanism. It essentially forces ASR models to produce incorrect transcriptions while also making them use substantially more computational power. This is triggered by a single, carefully crafted adversarial input.

Why This Matters to You

This new attack method has significant implications for anyone relying on speech recognition system. Imagine trying to transcribe an important meeting, only for the ASR system to generate a garbled, overly long output. What’s more, this increased processing burden could lead to higher operational costs for companies providing ASR services. This might even slow down real-time applications. For example, think about a customer service chatbot that suddenly takes much longer to process your spoken queries. This could lead to frustration and reduced service quality for you.

Here are some key impacts of the MORE attack:

  • Degraded Accuracy: ASR models produce many more errors.
  • Increased Latency: Systems take longer to process audio inputs.
  • Higher Costs: More computational resources are needed for processing.
  • Reduced Reliability: Trust in ASR systems can diminish.

As the team revealed, “MORE compels ASR models to produce incorrect transcriptions at a substantially higher computational cost, triggered by a single adversarial input.” This means a subtle change can have a big effect. How might these vulnerabilities impact your daily interactions with voice-activated devices?

The Surprising Finding

What truly stands out about MORE is its multi-objective nature. While past research mainly focused on how adversarial attacks degrade accuracy, robustness regarding efficiency was largely unexplored, according to the announcement. This narrow focus provided only a partial understanding of ASR model vulnerabilities. The surprising element here is that MORE consistently yields significantly longer transcriptions while maintaining high word error rates compared to existing baselines. This indicates that attackers can not only confuse the AI but also make it waste its resources by generating excessive, incorrect text. It challenges the assumption that an attack only needs to make an AI say the wrong thing. Now, it can also make the AI work harder to say the wrong thing, making it less efficient.

What Happens Next

The findings from the MORE attack highlight an important need for enhanced security measures in ASR creation. We can expect to see ASR developers focusing on more defenses in the coming months and quarters. For example, new research might explore real-time anomaly detection in audio inputs. This could help identify and filter out adversarial attacks before they impact performance. For users, it’s important to be aware of these evolving threats and advocate for more resilient AI systems. The industry implications are clear: ASR models must be designed with both accuracy and efficiency robustness in mind. The paper states that this comprehensive study of ASR robustness under multiple attack scenarios is crucial for maintaining reliable performance in real-time environments.

Ready to start creating?

Create Voiceover

Transcribe Speech

Create Dialogues

Create Visuals

Clone a Voice