LLMs Learn Spatial Reasoning with New Two-Stage AI Approach

Researchers enhance large language models' ability to plan and navigate structured environments.

A new research paper introduces a two-stage method to significantly improve spatial reasoning in large language models (LLMs). This approach teaches LLMs basic spatial physics before training them to plan complex, multi-step actions in puzzle-like settings, outperforming existing methods.

By Sarah Kline

January 4, 2026

4 min read

LLMs Learn Spatial Reasoning with New Two-Stage AI Approach

Key Facts

A new two-stage approach enhances spatial reasoning in large language models (LLMs).
The method involves supervised fine-tuning for basic spatial transformations (rotation, translation, scaling).
LoRA adapters are then used for multi-step planning in puzzle-based environments.
The approach consistently outperforms baselines, including generic LLMs and end-to-end reinforcement learning.
Researchers synthesized an ASCII-art dataset and environment for training and evaluation.

Why You Care

Ever wish your AI assistant could truly understand directions or help you organize a complex physical space? Large language models (LLMs) are incredibly smart, but they often struggle with basic spatial tasks. This new research aims to fix that. It’s about giving AI a better sense of “where things are” and “how they move.” Why should you care? Because improved spatial reasoning means smarter AI in everything from navigation apps to robotics. Imagine your smart home truly understanding your layout, not just responding to voice commands. Your digital interactions could become far more intuitive.

What Actually Happened

Researchers have unveiled a novel two-stage approach to boost spatial reasoning in LLMs, as detailed in the blog post. This method addresses a long-standing challenge: LLMs’ difficulty with spatial transformations and multi-step planning. The team’s goal was to equip these models with a more intuitive understanding of physical space. First, they used supervised fine-tuning (SFT) on elementary spatial transformations. This taught the models basic spatial physics, covering actions like rotation, translation, and scaling. Think of it as teaching an AI the fundamental rules of movement and size. Following this, the physics-aware model was frozen. Then, lightweight LoRA adapters (Low-Rank Adaptation) were trained within the GRPO structure. This second stage focused on learning policies to compose these basic building blocks. This allows for multi-step planning in complex, puzzle-based environments. The entire process operates in a closed-loop manner, according to the announcement.

Why This Matters to You

This creation has significant implications for how you interact with AI. It means future AI systems could understand your world more accurately. For example, imagine asking an AI to rearrange your virtual furniture. It could then suggest optimal placements based on actual spatial logic. This goes beyond simple text generation; it’s about practical, real-world understanding. What kind of complex spatial problems could you solve with an AI that truly grasps physical relationships?

The research shows this new method consistently outperforms existing baselines. This includes generic LLM backbones and even end-to-end reinforcement learning models. The approach also converges faster and exhibits more stable training, as mentioned in the release. This efficiency is crucial for developing AI applications. The team synthesized an ASCII-art dataset and built a corresponding ASCII-based reinforcement learning environment to support this pipeline, the paper states. This allowed them to test the LLMs’ abilities in structured, visual scenarios. Amir Tahmasbi, one of the authors, highlighted the importance of this foundational work, stating, “Our method consistently outperforms baselines, including the generic backbone, physics-aware model, and end-to-end RL models, under both Dynamic environments with explicit state updates and Static environments where the model must rely on its internal state across steps.”

Performance Metric	New Two-Stage Method	End-to-End RL	Generic LLM Backbone
Planning Accuracy	High	Moderate	Low
Training Stability	High	Low	N/A
Convergence Speed	Fast	Slow	N/A

The Surprising Finding

Perhaps the most surprising finding is how effectively this two-stage approach improves spatial understanding. LLMs are known for their impressive language capabilities. However, they have traditionally struggled with spatial tasks, which often require a different kind of reasoning. The study finds that by explicitly teaching “basic spatial physics” first, then layering on multi-step planning, models gain a deeper, more understanding. This challenges the assumption that a single, end-to-end learning process is always superior. The team revealed that their approach converges faster and exhibits more stable training compared to end-to-end reinforcement learning from scratch. This efficiency was unexpected given the complexity of spatial reasoning. It suggests that breaking down complex problems into foundational building blocks can be more effective for AI learning.

What Happens Next

This research paves the way for more spatially aware AI systems. We can expect to see these advancements integrated into practical applications within the next 12-24 months. Imagine AI assistants that can not only answer questions but also help you design a room or navigate a complex factory floor. For example, a future AI could assist architects by evaluating structural integrity based on spatial arrangements. For you, this means more intelligent robots and more intuitive virtual reality experiences. Developers should consider adopting similar modular training approaches for complex AI tasks. This could lead to more efficient and capable models. The team also analyzed attention patterns to assess whether fine-tuning induces meaningful improvements in spatial understanding, according to the announcement. This suggests ongoing work to further refine and understand these models. This could lead to even more spatial reasoning capabilities in the near future.

Ready to start creating?