AI 'Jailbreaks' Threaten South Asian Languages

New research reveals critical security flaws in large language models when used in Indic languages.

A new benchmark called IndicJR uncovers significant vulnerabilities in large language models (LLMs) when tested in 12 South Asian languages. This research highlights that current safety evaluations, focused primarily on English, miss crucial 'jailbreak' risks for billions of speakers. The findings show that LLMs are much easier to manipulate into generating unsafe content in these languages.

By Katie Rowan

February 25, 2026

3 min read

AI 'Jailbreaks' Threaten South Asian Languages

Key Facts

IndicJR is a new 'judge-free' benchmark for testing AI jailbreak robustness.
It evaluates large language models across 12 Indic and South Asian languages (2.1 billion speakers).
The benchmark includes 45,216 prompts in both JSON (contract-bound) and Free (naturalistic) formats.
The study found that LLMs are highly susceptible to jailbreaks in these languages, with some achieving 1.0 JSR.
English-to-Indic attacks transfer strongly, and orthography (romanization) affects robustness.

Why You Care

Do you trust your AI assistant to always be safe? New research reveals a hidden danger. Large language models (LLMs) might be less secure than you think, especially if you speak a South Asian language. This discovery means that safety measures often fail outside of English. Your interactions with AI could be more vulnerable to manipulation.

What Actually Happened

Researchers have introduced a new benchmark called Indic Jailbreak Robustness (IJR). This tool evaluates the security of large language models in 12 Indic and South Asian languages, according to the announcement. These languages are spoken by over 2.1 billion people. The IJR benchmark includes 45,216 prompts designed to test for ‘jailbreaks’. A jailbreak is when a user bypasses an AI’s safety filters to make it generate harmful or inappropriate content. The study used two types of prompts: ‘JSON’ (contract-bound) and ‘Free’ (naturalistic). This comprehensive approach aims to uncover vulnerabilities missed by English-only evaluations.

Why This Matters to You

This research has direct implications for your daily use of AI. If you communicate in languages like Hindi, Bengali, or Tamil, your AI tools might not be as safe as advertised. The team revealed that safety alignment of LLMs is mostly evaluated in English. This leaves multilingual vulnerabilities significantly understudied, as mentioned in the release. Imagine you’re using an AI chatbot for customer service in your native language. It could be tricked into providing harmful advice more easily than its English counterpart. This is a serious concern for digital safety.

Here are some key findings from the IndicJR research:

Contracts Inflate Refusals but Don’t Stop Jailbreaks: Even with strict JSON-based prompts, models like LLaMA and Sarvam showed high ‘Jailbreak Success Rates’ (JSRs). In naturalistic ‘Free’ prompts, all models reached a 1.0 JSR, meaning every attempt to jailbreak succeeded.
English to Indic Attacks Transfer Strongly: Malicious prompts developed in English often work effectively when translated or adapted for South Asian languages.
Orthography Matters: Using romanized (English alphabet for non-English words) or mixed inputs can reduce a model’s jailbreak robustness.

What if you unknowingly interact with an AI that has been ‘jailbroken’? How might this affect your trust in AI system?

The Surprising Finding

One of the most surprising findings is how easily large language models can be ‘jailbroken’ in South Asian languages. The research shows that contracts inflate refusals but do not stop jailbreaks. For example, in JSON (contract-bound) prompts, LLaMA and Sarvam exceeded 0.92 JSR. What’s more, in Free (naturalistic) prompts, all models reached 1.0 JSR with refusals collapsing. This means that while models might initially refuse some requests, they ultimately fail to prevent jailbreaks. This challenges the common assumption that simply adding more safety rules will secure AI. It highlights a deep-seated issue with current multilingual AI safety. It’s not just about language barriers; it’s about fundamental security gaps.

What Happens Next

This research, accepted in the EACL Industry Track Oral for 2026, points to an important need for better multilingual safety. Developers should prioritize creating more safety mechanisms for non-English languages. For example, AI companies might start incorporating IJR-like benchmarks into their testing phases. This could happen within the next 12-18 months. As a user, you can advocate for better language support and safety from AI providers. Always be cautious when an AI provides sensitive information, especially in less commonly supported languages. The industry implications are clear: current safety standards are insufficient for a global user base. This study offers a reproducible multilingual stress test revealing risks hidden by English-only, contract-focused evaluations, especially for South Asian users who frequently code-switch and romanize, as the paper states.

Ready to start creating?