AI Learns Document Structure for Better Translations

New 'FormatRL' method tackles complex XML and HTML in AI translation.

Researchers have introduced Format Reinforcement Learning (FormatRL), a new AI method for translating structured documents like XML and HTML. This approach moves beyond sentence-level translation, focusing on maintaining document format and improving overall quality. Experiments show significant improvements in handling complex software documentation.

Sarah Kline

By Sarah Kline

December 15, 2025

4 min read

AI Learns Document Structure for Better Translations

Key Facts

  • Format Reinforcement Learning (FormatRL) is a new method for structured document translation.
  • FormatRL uses Group Relative Policy Optimization and structure-aware rewards like TreeSim and Node-chrF.
  • It was tested on the SAP software-documentation benchmark, showing improvements across six metrics.
  • The method employs StrucAUC to differentiate between minor errors and major structural failures.
  • Traditional methods struggle with document-level XML or HTML structures, focusing only on sentence translation.

Why You Care

Ever struggled with a translated document where the text was fine but the formatting was a mess? Imagine trying to read a user manual or a legal contract where tables were jumbled and headings misplaced. How frustrating is that for your business or your daily tasks?

New research from Haiyue Song and a team of collaborators introduces Format Reinforcement Learning (FormatRL). This AI approach directly addresses the challenge of accurately translating complex structured documents, not just individual sentences. This creation means your translated documents could soon look as good as the originals, saving you time and headaches.

What Actually Happened

Traditional AI translation methods often fall short when dealing with documents that have intricate layouts, such as those using XML or HTML. These methods primarily focus on translating text at the sentence level, according to the announcement. They struggle to preserve the original document’s structure.

To overcome this, the research team proposed FormatRL. This system uses Group Relative Policy Optimization, building on a supervised fine-tuning model. Its goal is to directly improve translation quality by focusing on structure-aware rewards. These rewards include TreeSim, which measures the similarity of structural trees, and Node-chrF, which evaluates translation quality at specific XML nodes, as detailed in the blog post. This means the AI isn’t just translating words; it’s learning the document’s blueprint.

Why This Matters to You

This new approach has practical implications for anyone working with multilingual content. If your company relies on translating technical manuals, legal documents, or web content, FormatRL could significantly improve efficiency and accuracy. You will get translations that are not only linguistically correct but also structurally sound.

Consider this: “Recent works on structured text translation remain limited to the sentence level, as they struggle to effectively handle the complex document-level XML or HTML structures,” the paper states. This highlights a essential gap that FormatRL aims to fill. Think of the time your team spends manually reformatting translated documents. What if that effort was drastically reduced?

For example, imagine a global software company needing to translate its user interface (UI) documentation into a dozen languages. With previous methods, engineers might spend hours fixing misplaced elements or broken links in the translated versions. With FormatRL, the system aims to maintain the original XML structure, ensuring buttons and fields appear where they should. This could mean faster product launches and happier international customers for your business.

Key Improvements with FormatRL:

  • Structural Fidelity: Maintains the original document’s layout and hierarchy.
  • Node-Level Accuracy: Ensures text within specific elements is translated correctly.
  • Reduced Manual Effort: Less need for human intervention to fix formatting.
  • Enhanced User Experience: Delivers consistent, high-quality translated documents.

The Surprising Finding

Perhaps the most interesting aspect of this research is how effectively FormatRL distinguishes between minor errors and major structural failures. The team applied StrucAUC, a fine-grained metric, to achieve this, the study finds. This is quite surprising because often, AI systems treat all errors equally.

Instead, FormatRL prioritizes fixing fundamental structural issues over small formatting quirks. The analysis further shows how different reward functions contribute to improvements in both structural and translation quality, according to the announcement. This challenges the common assumption that simply translating text perfectly is enough. It reveals that understanding the document’s architecture is equally, if not more, important for usability. For instance, a missing paragraph break might be minor, but an entire section appearing out of order is a essential failure.

What Happens Next

The research team FormatRL on the SAP software-documentation benchmark. The experiments demonstrated improvements across six different metrics, the company reports. This suggests a promising future for more AI translation tools.

We can expect to see further developments and perhaps commercial applications emerging in the next 12 to 18 months. Imagine a future where your content management system (CMS) automatically translates and formats web pages for different locales, maintaining their original design. For example, a marketing department could push out a global campaign across multiple languages instantly, with all web elements perfectly aligned.

Companies that handle vast amounts of structured data, like legal firms or financial institutions, should pay close attention. Implementing similar AI methods could streamline their localization workflows. The industry implications are significant, potentially leading to a new standard for document translation quality. This shift will likely focus on preserving both linguistic meaning and visual integrity in translated content. Your translation tools might soon become much more intelligent about document layout.

Ready to start creating?

Create Voiceover

Transcribe Speech

Create Dialogues

Create Visuals

Clone a Voice