LLMs Fall Short in Sanskrit Poetry-to-Prose Conversion

A new study reveals specialized models outperform large language models for complex linguistic tasks.

Large Language Models (LLMs) are not always the best solution, especially for complex linguistic tasks in low-resource languages. New research shows smaller, task-specific models significantly outperform LLMs in converting Sanskrit poetry to prose. This finding challenges the idea that LLMs are universal problem-solvers.

By Sarah Kline

November 21, 2025

4 min read

LLMs Fall Short in Sanskrit Poetry-to-Prose Conversion

Key Facts

Large Language Models (LLMs) were compared against smaller, task-specific Seq2Seq models.
The task involved converting Sanskrit poetry to prose (anvaya), a complex linguistic challenge.
Domain-specific fine-tuning of a ByT5-Sanskrit Seq2Seq model significantly outperformed all LLM approaches.
Human evaluation strongly corroborated the superior performance of the specialized model.
The study challenges the assumption that LLMs are universal, general-purpose solutions for all NLP tasks.

Why You Care

Do you rely on AI for complex text tasks? What if the most AI isn’t always the best choice? A recent study reveals that Large Language Models (LLMs) struggle with specific, intricate linguistic challenges. This means your assumptions about AI’s universal capabilities might need a rethink.

What Actually Happened

Researchers investigated whether LLMs could outperform smaller, task-specific models in converting Sanskrit poetry to prose. This task, known as anvaya, is particularly difficult. Sanskrit verse has flexible word order and strict metrical rules, according to the announcement. Converting it to canonical prose involves compound segmentation, dependency resolution, and syntactic linearization, as detailed in the blog post.

The team compared instruction-tuned and in-context-prompted LLMs with a fine-tuned ByT5-Sanskrit Seq2Seq model. This smaller, specialized model was designed for sequence-to-sequence tasks. The experiments focused on Sanskrit because it is a low-resource, morphologically rich language. This makes it an ideal testbed for evaluating AI performance beyond common English applications.

Why This Matters to You

This research has significant implications for how you approach AI solutions. It suggests that bigger isn’t always better, especially for niche applications. Imagine you’re building an AI for a highly specialized legal or medical text analysis. Relying solely on a general-purpose LLM might lead to suboptimal results.

Key Findings from the Study:

Specialized Model Superiority: Domain-specific fine-tuning of ByT5-Sanskrit significantly outperformed all instruction-driven LLM approaches.
Human Corroboration: Human evaluation strongly supported these findings, showing high correlation with Kendall’s Tau scores.
Generalization: The task-specific Seq2Seq model demonstrated generalization on out-of-domain evaluations.

“Domain-specific fine-tuning of ByT5-Sanskrit significantly outperforms all instruction-driven LLM approaches,” the paper states. This highlights the value of targeted AI creation. How might this change your strategy for selecting AI tools for your next project?

For example, consider a company trying to translate ancient texts. While an LLM might offer a quick general translation, a specialized model trained on specific linguistic nuances could provide far greater accuracy and contextual understanding. This could save you time and resources in the long run.

The Surprising Finding

Here’s the twist: despite the hype around LLMs as universal solutions, they were decisively beaten in this specific task. The study found that smaller, task-specific models are more effective for Sanskrit poetry-to-prose conversion. This challenges the common assumption that LLMs are inherently superior for all natural language processing (NLP) tasks, according to the research.

Specifically, the researchers revealed that domain-specific fine-tuning of ByT5-Sanskrit significantly outperforms all instruction-driven LLM approaches. This outcome is surprising because LLMs are often perceived as capable of handling virtually any text-based challenge. However, the intrinsic complexity of Sanskrit’s free word order and metrical constraints proved too much for the general-purpose LLMs without deep specialization. This suggests that for highly structured or nuanced linguistic tasks, a focused approach still yields better results.

What Happens Next

Looking ahead, this research suggests a potential shift in AI creation strategies. We might see more emphasis on hybrid AI systems. These systems could combine the broad capabilities of LLMs with the precision of specialized models. Industry implications include a renewed focus on fine-tuning and domain-specific AI solutions, particularly for less common languages or highly technical fields.

For example, imagine a future where you use an LLM for initial content generation. Then, a smaller, specialized AI refines that content for specific grammatical or stylistic requirements. This could be particularly useful in creative writing or legal document drafting. Developers might focus on creating more efficient, purpose-built models rather than relying solely on massive, general-purpose LLMs. This could lead to more accurate and efficient AI tools available by late 2025 or early 2026.

Our actionable advice for you: evaluate your AI needs carefully. Don’t assume a large, general model is always the best fit. Consider investing in or developing specialized models for your most essential and complex tasks. This approach could lead to more and reliable AI applications for your business.

Ready to start creating?