New AI Fights Hate Speech in Audio with LLMs

Researchers unveil an integrated AI model that simultaneously transcribes and censors harmful content in spoken language.

A new AI model, integrating large language models (LLMs) with automatic speech recognition (ASR), has been developed to combat hate speech. This system can both transcribe spoken words and mask hate-related terms, showing improved masking accuracy. It uses a novel data generation and curriculum learning approach to train effectively.

By Mark Ellison

January 10, 2026

4 min read

New AI Fights Hate Speech in Audio with LLMs

Key Facts

The proposed method integrates an ASR model encoder with an LLM decoder for simultaneous transcription and censorship.
The system achieved a masking accuracy of 58.6% for hate-related words, outperforming previous baselines.
Synthetic training data was generated using LLMs with Chain-of-Thought (CoT) prompting and text-to-speech (TTS) conversion.
A filtering mechanism was introduced to remove non-hate speech samples containing hate-related words from the synthetic dataset.
Curriculum learning was used to gradually train the LLMs by controlling the 'level of hate' in the generated data.

Why You Care

Ever worried about harmful content slipping through the cracks in online audio? What if AI could catch and censor hate speech in real-time, directly from spoken words? A new creation in AI is tackling this head-on, promising a safer digital space for your ears and your community.

What Actually Happened

Researchers Ryutaro Oshima, Yuya Hosoda, and Youji Iiguni have unveiled an automatic speech recognition (ASR) model designed to identify and censor hate speech. This system, detailed in their paper, integrates the encoder of an ASR model with the decoder of large language models (LLMs). The goal, according to the announcement, is to perform simultaneous transcription and censorship tasks. This prevents the exposure of harmful content in audio streams. The team explained that instruction tuning of the LLM to mask hate-related words is crucial. However, it typically requires a specialized, annotated hate speech dataset, which is often limited. To overcome this, the researchers generated text samples using an LLM with the Chain-of-Thought (CoT) prompting technique. These samples were then converted into speech using a text-to-speech (TTS) system.

Why This Matters to You

Imagine a world where your podcasts, live streams, or even voice-activated assistants are inherently safer. This new automatic hate speech recognition system could make that a reality. It directly addresses the challenge of harmful content in audio. The research shows that the proposed method achieved a masking accuracy of 58.6% for hate-related words. This surpasses previous baseline methods, as detailed in the blog post. This betterment means more effective filtering of offensive language.

Think of it as a smart filter for your audio content. For example, a podcaster could use this system to automatically bleep out inappropriate words during a live recording. This would happen before the content even reaches listeners. Or, consider a system hosting user-generated audio. This system could help them maintain community standards proactively. How might this enhanced content moderation impact your online interactions?

“This paper proposes an automatic speech recognition (ASR) model for hate speech using large language models (LLMs),” the paper states. “The proposed method integrates the encoder of the ASR model with the decoder of the LLMs, enabling simultaneous transcription and censorship tasks to prevent the exposure of harmful content.” This integration is key to its dual functionality.

The Surprising Finding

Here’s an interesting twist: the initial method of generating synthetic data faced a challenge. The team revealed that some generated samples contained non-hate speech but still included hate-related words. This unexpected outcome degraded the censorship performance. To fix this, the researchers implemented a filtering step. They filtered samples that text classification models correctly labeled as hate content. By adjusting a threshold for the number of correct answer models, they could control the ‘level of hate’ in the generated dataset. This allowed them to train the LLMs through curriculum learning in a gradual manner. This approach is surprising because it highlights the complexity of synthetic data generation. It shows that even AI-generated data needs careful curation to be effective.

What Happens Next

This research, presented at the 17th Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2025), points to a future of more content moderation. We can expect to see further refinements of this automatic hate speech recognition system over the next 12-18 months. Future applications could include real-time moderation for social audio apps or enhanced filtering for call centers. For example, imagine a customer service system using this AI to flag and redact offensive language in recorded calls. This could protect both employees and customers. If you’re a content creator, start exploring tools that offer audio moderation. Keep an eye on new developments in LLM-integrated ASR systems. The company reports that curriculum training contributes to the efficiency of both transcription and censorship tasks. This suggests a promising path forward for the industry.

Ready to start creating?