New AI Model Detects Emotions in English and SEA Languages

MERaLiON-SER offers robust speech emotion recognition for diverse linguistic contexts.

A new AI model, MERaLiON-SER, has been developed for robust speech emotion recognition. It excels in both English and Southeast Asian languages, using a dual-loss training approach for comprehensive emotion detection. This model outperforms existing open-source and large Audio-LLMs.

Katie Rowan

By Katie Rowan

November 10, 2025

4 min read

New AI Model Detects Emotions in English and SEA Languages

Key Facts

  • MERaLiON-SER is a robust speech emotion recognition model.
  • It is designed for English and Southeast Asian languages.
  • The model uses a hybrid objective combining weighted categorical cross-entropy and Concordance Correlation Coefficient (CCC) losses.
  • It models both discrete (e.g., happy, angry) and dimensional (e.g., arousal, valence, dominance) emotions.
  • MERaLiON-SER consistently outperforms open-source speech encoders and large Audio-LLMs.

Why You Care

Ever wish computers could truly understand how you feel just by listening to your voice? Imagine the possibilities. A new AI model, MERaLiON-SER, is making significant strides in speech emotion recognition (SER). This system could profoundly impact how you interact with AI, making it more empathetic and responsive. Why should you care? Because this creation brings us closer to AI that understands emotional nuances, not just words.

What Actually Happened

A team of researchers recently introduced MERaLiON-SER, a novel model for speech emotion recognition, according to the announcement. This model is specifically designed for both English and various Southeast Asian (SEA) languages. The team developed MERaLiON-SER using a hybrid objective during training. This approach combines weighted categorical cross-entropy and Concordance Correlation Coefficient (CCC) losses. This dual-loss method allows for joint discrete and dimensional emotion modeling, as detailed in the blog post. Discrete emotions are distinct categories like ‘happy’ or ‘angry.’ Dimensional emotions capture fine-grained aspects such as arousal (intensity), valence (positivity/negativity), and dominance (sense of control). This comprehensive approach leads to a more representation of human affect.

Why This Matters to You

This new speech emotion recognition model has practical implications for you. Think about customer service, for example. If an AI can accurately detect frustration in your voice, it could escalate your call more effectively. The research shows MERaLiON-SER consistently surpasses existing open-source speech encoders and even large Audio-LLMs (Large Language Models for Audio). This indicates a significant leap in accuracy. “This dual approach enables the model to capture both the distinct categories of emotion (like happy or angry) and the fine-grained, such as arousal (intensity), valence (positivity/negativity), and dominance (sense of control), leading to a more comprehensive and representation of human affect,” the paper states. How might your daily interactions with voice assistants change if they truly understood your emotional state?

Here are some key benefits of MERaLiON-SER:

  • Enhanced Accuracy: Better at identifying emotions than previous models.
  • Multilingual Support: Works effectively across English and Southeast Asian languages.
  • Comprehensive Understanding: Detects both broad emotional categories and subtle emotional dimensions.
  • Robustness: Performs well even in diverse linguistic environments.

Imagine a scenario where your smart home assistant detects your stress levels and automatically adjusts lighting or plays calming music. This is the future this speech emotion recognition model helps to build for you.

The Surprising Finding

Here’s the twist: the study finds that specialized speech-only models, like MERaLiON-SER, are crucial for accurate paralinguistic understanding. This means understanding the non-verbal cues in speech. This finding challenges the common assumption that larger, more general Audio-LLMs would automatically be superior. The team revealed that MERaLiON-SER consistently outperforms these larger models in emotion detection. This underscores the importance of models specifically tuned for speech emotion recognition. It also highlights their capability for cross-lingual generalization. It’s surprising because many might expect a broader AI to handle everything. However, the data indicates that focused AI creation often yields superior results in specific domains.

What Happens Next

Looking ahead, we can expect further creation and integration of speech emotion recognition models like MERaLiON-SER. The team’s work suggests a future where AI systems are more emotionally intelligent. Over the next 12-18 months, you might see this system deployed in more virtual assistants. For example, imagine call center AI that can proactively offer solutions based on detected emotional distress. For developers, the actionable takeaway is to consider specialized models for specific AI tasks. This could lead to more effective and nuanced applications. The industry implications are vast, suggesting a shift towards more emotionally aware AI interfaces. This will make system feel more human and less robotic. The team’s results underscore the importance of specialized speech-only models for accurate paralinguistic understanding and cross-lingual generalization, as mentioned in the release.

Ready to start creating?

Create Voiceover

Transcribe Speech

Create Dialogues

Create Visuals

Clone a Voice