AI Models Learn to Think Fast and Slow, Just Like Us

New research allows Large Reasoning Models to dynamically adjust their 'thinking speed' for better efficiency.

Researchers have developed a method to control the 'thinking speed' of Large Reasoning Models (LRMs), mimicking human cognition. This innovation improves accuracy while significantly reducing computational costs, making AI more efficient.

Katie Rowan

By Katie Rowan

November 1, 2025

5 min read

AI Models Learn to Think Fast and Slow, Just Like Us

Key Facts

  • New research enables Large Reasoning Models (LRMs) to dynamically adjust their 'thinking speed.'
  • The approach mimics human System 1 (fast, intuitive) and System 2 (slow, deliberate) thinking.
  • It uses a 'steering vector' to control thinking speed and real-time difficulty estimation to decide when to adjust.
  • The plug-in module delivers an average +1.3% accuracy with -8.6% token usage.
  • The improvements are achieved without any additional training or cost.

Why You Care

Ever wish your AI tools could be smarter and faster, without costing a fortune? What if they could decide when to think quickly and when to ponder deeply, just like you do? New research reveals a method for Large Reasoning Models (LRMs) to do exactly that, promising a future of more efficient and intelligent AI.

This creation matters because it tackles a core limitation of current AI: its often slow and costly ‘thinking’ process. If your business relies on AI for complex tasks, this could mean faster results and lower operational expenses for your projects.

What Actually Happened

Researchers have introduced a novel approach to enable Large Reasoning Models (LRMs) to dynamically adjust their ‘thinking speed.’ This method allows AI to switch between fast, intuitive processing and slow, deliberate reasoning, according to the announcement. The inspiration comes from human cognition, which operates in two modes: System 1 (fast, intuitive) and System 2 (slow, deliberate). While current LRMs excel at System 2 thinking, their inability to perform fast thinking leads to high computational overhead and latency, the paper states. The team addressed two key questions: how to control this thinking speed and when to adjust it for optimal performance, as detailed in the blog post.

They identified a ‘steering vector’ within the LRMs’ representation space that governs transitions between slow and fast thinking. This allows for the first representation editing-based test-time scaling effect, outperforming existing prompt-based methods, the research shows. For deciding when to adjust, they use real-time difficulty estimation. This signals reasoning segments of varying complexity, enabling the model to process easy steps quickly and analyze complex reasoning more deeply, the documentation indicates.

Why This Matters to You

Imagine you’re using an AI assistant to draft a complex report. Instead of laboriously processing every sentence, this new approach allows the AI to quickly handle the straightforward parts. Then, it dedicates more computational ‘thought’ to the tricky sections requiring deeper analysis. This means you get a faster, more accurate output without waiting forever.

This system provides a significant betterment to the accuracy-efficiency trade-off in AI. According to the announcement, this plug-in module delivers an average +1.3% accuracy. What’s more, it achieves this with a notable -8.6% reduction in token usage across leading LRMs and reasoning benchmarks. “We achieve the first representation editing-based test-time scaling effect, outperforming existing prompt-based scaling methods,” the team revealed. This means your AI applications could become both more precise and less expensive to run. How will this improved efficiency change the way you interact with AI in your daily tasks or business operations?

Here’s a quick look at the benefits:

  • Increased Accuracy: Models become more precise.
  • Reduced Token Usage: Lower operational costs for AI tasks.
  • Faster Processing: Quicker responses for simpler problems.
  • Dynamic Adaptation: AI adjusts its effort based on task difficulty.

Think of it as giving your AI a smart supervisor that tells it when to sprint and when to marathon. This makes AI not just smarter, but also more economical, directly benefiting your bottom line if you’re running AI-powered services.

The Surprising Finding

What’s particularly striking about this creation is its ‘no-cost’ implementation. You might expect such a significant performance boost to require extensive retraining or additional computational resources. However, the study finds that their plug-in module achieves these gains “without any training or additional cost.” This challenges the common assumption that improving AI performance always comes with a higher price tag or more complex creation. The ability to enhance accuracy and reduce token usage simply by understanding and manipulating the model’s internal ‘thinking’ mechanism is truly unexpected. It suggests that much untapped potential lies within existing LRM architectures, waiting to be unlocked through clever algorithmic insights rather than brute-force computational power. The team revealed that their algorithms are implemented based on vLLM, making them broadly applicable.

What Happens Next

This new approach is expected to support broader applications and inspire future research, according to the announcement. We can anticipate seeing this ‘thinking speed’ control integrated into various Large Reasoning Models in the coming months, possibly by early 2026. For example, imagine content creation platforms using this system to generate routine social media posts quickly. Then, they would apply deeper analysis for crafting nuanced long-form articles. This could lead to a noticeable betterment in the speed and quality of AI-generated content you consume.

For readers, the actionable advice is to keep an eye on updates from major AI providers. Ask if their models are incorporating dynamic reasoning capabilities. This could directly impact the efficiency and cost of your AI-driven workflows. The industry implications are significant, potentially leading to a new standard for LRM deployment where efficiency is as crucial as raw processing power. The team hopes this work will “inspire future research” in the field, suggesting a wave of creation focused on smarter, not just bigger, AI models.

Ready to start creating?

Create Voiceover

Transcribe Speech

Create Dialogues

Create Visuals

Clone a Voice