New AI Router Slashes LLM Costs by 84% While Boosting Reasoning

R2-Reasoner framework allows large language models to collaborate on sub-tasks, making advanced AI more accessible.

A new framework called R2-Reasoner uses a Reinforced Model Router to significantly cut down the operational costs of large language models (LLMs) while maintaining high reasoning accuracy. This system enables nine different LLMs to work together on complex problems, breaking them into smaller parts. The approach could make powerful AI much more affordable for businesses and developers.

Mark Ellison

By Mark Ellison

December 12, 2025

3 min read

New AI Router Slashes LLM Costs by 84% While Boosting Reasoning

Key Facts

  • R2-Reasoner is a novel framework that uses a Reinforced Model Router to scale LLM reasoning.
  • It orchestrates collaboration among nine heterogeneous models, ranging from less than 1B to hundreds of billions of parameters.
  • The framework reduces API costs by 84.46% compared to state-of-the-art baselines.
  • It maintains competitive reasoning accuracy across six challenging reasoning benchmarks.
  • The router breaks complex queries into subtasks and assigns them to optimal models, balancing performance and cost.

Why You Care

Ever wonder why AI tools can be so expensive to run? Imagine if you could get the same AI reasoning for a fraction of the cost. A new creation promises exactly that, making AI more accessible. This could change how you interact with and build AI applications.

What Actually Happened

Researchers have introduced R2-Reasoner, a novel structure designed to scale large language model (LLM) reasoning efficiently. The core of this system is a Reinforced Model Router that orchestrates collaboration among nine diverse models, according to the announcement. These models vary significantly in size, from less than 1 billion to hundreds of billions of parameters. The router first breaks down complex queries into smaller subtasks using a ‘decomposer.’ Then, a ‘subtask allocator’ assigns each subtask to the most suitable model. This process balances performance with cost, ensuring optimal resource use. The team revealed that training involves a two-stage alternating process, combining supervised fine-tuning with reinforcement learning for self-supervised refinement.

Why This Matters to You

This creation directly impacts your budget and the capabilities of the AI you use or develop. By making LLM reasoning more cost-effective, R2-Reasoner opens doors for wider adoption. Think of it as having a team of specialized AI experts. Instead of paying a top-tier expert for every small task, you send each part of a problem to the most appropriate, cost-efficient expert. This means your AI projects could become significantly more affordable.

How much could this save you?

AspectTraditional LLM ApproachR2-Reasoner Approach
API CostsHigh84.46% Reduction
Model UsageSingle Large LLMHybrid LLM Team
Task HandlingTask-level routingSubtask-level routing

This structure allows for more efficient coordination at the level of intermediate reasoning steps, or ‘thoughts,’ as detailed in the blog post. This finer-grained collaboration helps manage the computational demands of complex reasoning. “Collaboration at the level of intermediate reasoning steps (thoughts) could enable more efficient coordination,” the paper states. This approach also addresses challenges in router scheduling and task decomposition. Do you think this cost reduction will lead to a new wave of AI applications?

The Surprising Finding

Here’s the twist: traditionally, enhancing LLM reasoning, especially with ‘chain-of-thought’ methods, leads to very high computational costs. However, the R2-Reasoner structure achieves a dramatic cost reduction without sacrificing accuracy. The research shows that R2-Reasoner reduces API costs by 84.46% compared with baselines. This is while maintaining competitive reasoning accuracy across six challenging benchmarks. This finding challenges the assumption that LLM reasoning must always come with a hefty price tag. It suggests that smart orchestration, rather than just raw model size, can be the key to and efficient AI. It means you can get results without breaking the bank.

What Happens Next

The R2-Reasoner structure paves the way for more and efficient reasoning systems, as mentioned in the release. We can expect to see further developments and integrations within the next 12-18 months. For example, imagine a customer service chatbot that uses this system. It could handle complex inquiries by routing specific parts of a question to smaller, specialized models. This would reduce operational costs significantly. Developers and businesses should explore how this ‘reinforced model router’ concept can be applied to their own AI pipelines. The company reports that their code is open-source, encouraging widespread adoption and further creation. This could lead to more affordable and AI tools becoming available to everyone.

Ready to start creating?

Create Voiceover

Transcribe Speech

Create Dialogues

Create Visuals

Clone a Voice