New AI Training Method Boosts Code Understanding Across Languages

Researchers introduce OORL and GEPO to help large language models master diverse programming languages.

A new research paper details OORL, a reinforcement learning framework, and GEPO, a preference optimization method. These tools aim to significantly improve how large language models (LLMs) understand and translate code across various programming languages. This could bridge the performance gap between popular and less common coding languages.

Sarah Kline

By Sarah Kline

December 12, 2025

4 min read

New AI Training Method Boosts Code Understanding Across Languages

Key Facts

  • Large language models (LLMs) currently show a performance disparity across programming languages.
  • Researchers leverage code translation tasks to transfer coding proficiency between languages.
  • A new reinforcement learning (RL) framework called OORL integrates on-policy and off-policy strategies.
  • Group Equivalent Preference Optimization (GEPO) trains LLMs using intermediate representations (IRs) groups.
  • The combined approach significantly improves LLM performance on code benchmarks across multiple languages.

Why You Care

Ever wish your AI coding assistant could flawlessly handle any programming language, not just Python or C++? What if there was a way to make large language models (LLMs) understand the nuances of less common code just as well? New research is tackling this exact challenge, promising to expand the capabilities of AI in software creation. This could mean more versatile coding tools for you, making your workflow smoother.

What Actually Happened

Researchers have unveiled a novel approach to training large language models (LLMs) for better multi-programming language understanding. As detailed in the abstract, the team is addressing a significant performance gap. This gap exists between well-known languages like Python and C++, and other, less frequently used programming languages. To bridge this divide, the researchers are leveraging code translation tasks. This process facilitates the transfer of coding proficiency across diverse programming languages, according to the announcement.

The core of their method involves a new reinforcement learning (RL) structure called OORL. OORL integrates both on-policy and off-policy strategies for training LLMs. Within OORL, on-policy RL guides code translation, using a rule-based reward signal. This signal is derived from unit tests, as the paper states. Complementing this, they introduce Group Equivalent Preference Optimization (GEPO). GEPO is a novel preference optimization method that trains LLMs using intermediate representations (IRs) groups.

Why This Matters to You

This creation could significantly impact how you interact with AI-powered coding tools. Imagine a future where your AI assistant understands your legacy codebase written in an obscure language. It could then translate it or debug it with the same precision it handles modern Python. This research aims to make that a reality, enhancing the LLM’s ability to capture nuanced aspects of code functionality.

For example, consider a developer maintaining an old system written in COBOL. Currently, AI tools might struggle with this. However, with OORL and GEPO, the AI could better understand the COBOL code’s intent. It could then suggest accurate modern equivalents or identify bugs. This would save you countless hours of manual deciphering.

Key Benefits for Developers:

  • Increased Versatility: LLMs can handle a wider array of programming languages.
  • Improved Code Translation: More accurate and reliable conversions between languages.
  • Enhanced Debugging: Better understanding of code functionality leads to quicker bug identification.
  • Reduced Legacy Code Burden: AI tools can assist with older, less common codebases.

As the team revealed, “LLMs can be guided to discern IRs equivalent to the source code from inequivalent ones, while also utilizing signals about the mutual equivalence between IRs within the group.” This means the AI learns not just what the code does, but also how different language constructs achieve the same goal. How might this improved understanding change your daily coding tasks?

The Surprising Finding

What’s particularly interesting is how the researchers combine different reward signals. You might assume that a single, clear reward for correct code would be enough. However, the study finds that a multi-layered approach is more effective. They use a “coarse-grained rule-based reward” from unit tests for on-policy RL. This is then refined by GEPO, which uses intermediate representations to guide the LLM. This dual-pronged strategy helps LLMs capture subtle code functionality.

This is surprising because it moves beyond simple pass/fail metrics for code. Instead, it teaches the AI to understand the meaning behind the code structure. By focusing on the equivalence of intermediate representations, the LLM learns deeper semantic connections. This is crucial for true multi-language proficiency. It challenges the assumption that direct, high-level feedback is always superior. Sometimes, granular, internal representations provide a better learning signal.

What Happens Next

This research, submitted in May 2025 and revised in December 2025, points towards future advancements in AI coding assistants. We can expect to see these techniques integrated into commercial LLMs within the next 12-18 months. Imagine a future where your integrated creation environment (IDE) offers real-time translation suggestions for code snippets across languages. This would be powered by models trained with OORL and GEPO.

For example, a developer could write a function in Python. The AI could then instantly suggest an equivalent, version in Java or Go. This would be based on its deep understanding of both languages. The industry implications are vast, potentially lowering barriers for developers to work across different system stacks. This could also accelerate the modernization of legacy systems. The team revealed that extensive experiments demonstrate “significant performance improvements on code benchmarks across multiple programming languages.” This suggests a strong foundation for future applications. Keep an eye out for updates from major AI creation platforms; your coding experience might soon become much more fluid.

Ready to start creating?

Create Voiceover

Transcribe Speech

Create Dialogues

Create Visuals

Clone a Voice