LLMs Struggle with Physical Commonsense in Basque

New research reveals limitations of large language models in understanding real-world physics for low-resource languages.

A recent study on Basque, a low-resource language, highlights a significant challenge for large language models (LLMs): physical commonsense reasoning. Researchers found that LLMs struggle to identify implausible scenarios, especially in dialectal variants, impacting their ability to truly understand the world.

By Sarah Kline

February 17, 2026

4 min read

LLMs Struggle with Physical Commonsense in Basque

Key Facts

The study focuses on physical commonsense reasoning in low-resource languages.
BasPhyCo is the first non-question-answering physical commonsense reasoning dataset for Basque.
LLMs were evaluated across three levels: plausibility, consistency, and verifiability.
LLMs showed limited physical commonsense, especially in the 'verifiability' task.
Performance was worse when processing dialectal variants of Basque.

Why You Care

Ever wonder if your AI assistant truly understands the world around it? Or does it just parrot back information? New research suggests that when it comes to physical commonsense reasoning, large language models (LLMs) still have a long way to go. This matters because it impacts how well these AI systems can interact with our physical reality. It affects everything from smart home devices to robotics. How much does your AI truly grasp the laws of physics?

What Actually Happened

Researchers Jaione Bengoetxea, Itziar Gonzalez-Dios, and Rodrigo Agerri recently published a paper examining LLM performance. They focused on non-question-answering (non-QA) physical commonsense reasoning tasks. Their study specifically looked at Basque, a lower-resourced language, as detailed in the blog post. The team developed BasPhyCo, a new dataset for this purpose. This dataset includes both standard and dialectal variants of Basque. According to the announcement, this is the first non-QA physical commonsense reasoning dataset for Basque. They evaluated various multilingual LLMs. They also models specifically pretrained for Italian and Basque. The goal was to see how well these models understand physical interactions without direct questions.

Why This Matters to You

This research reveals a essential limitation in current AI capabilities. LLMs, despite their impressive language skills, often lack a deep understanding of the physical world. This is particularly true for less common languages. Imagine trying to explain a complex physical task to an AI in a language it barely comprehends. The study highlights this challenge. It shows that LLMs struggle with identifying inconsistencies in physical narratives. This impacts their ability to make accurate predictions. Your future interactions with AI could be smoother if this improves. What if an AI could truly understand why a ball rolls downhill?

For example, consider a smart home system. If it doesn’t grasp basic physics, it might struggle with tasks like:

Optimizing energy use: Understanding how heat dissipates.
Navigating obstacles: Recognizing what objects can be moved or avoided.
Interpreting sensor data: Differentiating between normal and abnormal physical events.

According to the announcement, “LLMs exhibit limited physical commonsense capabilities in low-resource languages such as Basque, especially when processing dialectal variants.” This means that for languages like Basque, AI’s grasp of reality is even weaker. This could affect the creation of inclusive AI technologies. It also impacts how well AI can serve diverse linguistic communities. You might wonder if your language will ever be fully understood by AI.

The Surprising Finding

The most surprising finding centers on the ‘verifiability’ task. This task assesses an LLM’s ability to determine why a narrative is implausible. The study finds that LLMs perform poorly in this area. They can sometimes distinguish between plausible and implausible stories. However, they struggle to pinpoint the exact physical state causing the implausibility. This was particularly evident in low-resource languages like Basque. The team revealed this issue was even more pronounced with dialectal variants. This challenges the assumption that LLMs inherently grasp physical reality. Many believe these models learn common sense through vast amounts of text. However, this research suggests that textual exposure alone might not be enough. It might not build a physical understanding.

What Happens Next

This research paves the way for future AI creation. Expect to see more datasets like BasPhyCo emerging in the next 12-18 months. These will focus on physical commonsense reasoning for diverse languages. Researchers will likely concentrate on improving LLM architectures. They will aim to enhance their understanding of real-world physics. For example, future AI systems could incorporate more explicit physics engines. This would allow them to simulate physical interactions. Developers should consider incorporating more diverse, physically grounded training data. This includes data from various languages and dialects. The industry needs to move beyond simple language generation. It must focus on building AI that truly comprehends its environment. This will lead to more and reliable AI applications across the globe. The company reports that improving this area is crucial for practical AI deployment.

Ready to start creating?