AI's New Trick: Fixing Machine Translation Errors Automatically

New research introduces a winning approach for enhancing translation quality with large language models.

A recent paper by Govardhan Padmanabhan reveals a new method for improving machine translation. This approach, called QE-informed Retranslation, beat other methods in a WMT 2025 task. It uses large language models to select the best translation from several options.

By Sarah Kline

November 30, 2025

4 min read

AI's New Trick: Fixing Machine Translation Errors Automatically

Key Facts

Govardhan Padmanabhan submitted two approaches to the WMT 2025 Automated Translation Quality Evaluation Systems Task 3.
The winning approach, QE-informed Retranslation, selects the highest-quality translation from multiple LLM candidates.
This winning method is 'training-free' and achieved a Delta COMET score of 0.0201.
The second approach, similar to Automatic Post-Editing (APE), performed worse with a Delta COMET score of -0.0108.
APE systems are known to overcorrect, leading to performance degradation.

Why You Care

Ever relied on machine translation only to find awkward phrases or outright mistakes? What if AI could automatically fix these errors for you?

New research from Govardhan Padmanabhan addresses this common problem. It introduces a novel way to make machine translation (MT) much more accurate. This creation means your translated documents, websites, and conversations could soon be far more reliable. It directly impacts anyone using or creating multilingual content.

What Actually Happened

Govardhan Padmanabhan submitted two distinct approaches to the WMT 2025 Automated Translation Quality Evaluation Systems Task 3. This task focused on Quality Estimation (QE)-informed Segment-level Error Correction, according to the announcement. The core idea is to improve machine translation output.

One method, termed QE-informed Retranslation, emerged as the winner. This approach is “training-free,” meaning it doesn’t require extensive new data to learn. Instead, it selects the highest-quality translation from multiple options generated by different large language models (LLMs). These LLMs are AI systems capable of understanding and generating human-like text.

The second approach was more similar to Automatic Post-Editing (APE). APE systems traditionally try to correct MT output directly. However, the study finds that APE systems often “overcorrect,” which can actually worsen performance. This second method instructed an LLM to replace specific error substrings identified by QE explanations. A conditional heuristic was used to minimize edits, aiming for a high Gain-to-Edit ratio, as mentioned in the release.

Why This Matters to You

This research has significant implications for anyone who uses or develops machine translation tools. Imagine you are a content creator publishing articles in multiple languages. This new method could drastically reduce the time and effort needed for human review.

For example, consider a global e-commerce business. Accurate product descriptions in various languages are crucial for sales. With this improved system, the quality of automatically translated descriptions would be much higher, leading to better customer understanding and trust. The team revealed that the winning approach achieved a Delta COMET score of 0.0201. This score indicates a measurable betterment in translation quality.

How much time could you save if your initial machine translations were almost ? This system promises to make cross-lingual communication smoother and more reliable for your projects.

Here’s a quick comparison of the two approaches:

Approach Type	Key Mechanism	Outcome
QE-informed Retranslation	Selects best translation from multiple LLMs	Winning position (Delta COMET: 0.0201)
QE-informed Error Replacement	LLM replaces error substrings based on QE	Lower performance (Delta COMET: -0.0108)

The Surprising Finding

What might surprise you is the effectiveness of the “training-free” QE-informed Retranslation approach. While jointly training QE systems with APE has shown improved performance, the paper states that APE systems are still known to overcorrect. This often leads to a degradation in performance, according to the research.

This finding challenges the assumption that more complex, jointly trained systems are always superior. The winning method simply leveraged existing LLMs to generate diverse translations. It then used quality estimation to pick the best one. This straightforward strategy outperformed a more direct error correction method. The Delta COMET score for the winning approach was 0.0201, while the error replacement method had a negative score of -0.0108. This clearly shows the unexpected success of the simpler selection method.

What Happens Next

This research was presented at the WMT25 Shared Task in the EMNLP 2025 Conference. This suggests that the findings will be discussed and potentially integrated into future MT systems. We might see this approach implemented in commercial translation tools within the next 12-18 months.

For example, developers could integrate this QE-informed Retranslation into their existing machine translation APIs. This would allow users to get higher quality translations without needing to switch providers. Content platforms could also adopt this to enhance their localized content automatically. The industry implications are significant, pushing the boundaries of what automated translation can achieve.

For readers, it means keeping an eye on updates from major translation service providers. You might soon notice a subtle yet significant betterment in the quality of your translated texts. The team revealed this winning approach achieved the leading position on the subtask leaderboard, indicating its strong potential for wider adoption.

Ready to start creating?