AI's Hidden Bias: Math Models Struggle with Cultural Nuances

New research reveals large language models perform worse on math problems with non-Western cultural contexts.

A recent study found that large language models (LLMs) struggle with mathematical problems presented in culturally adapted contexts. While often seen as neutral, math problems contain implicit cultural biases. This research highlights a consistent performance gap, with models doing better on US-centric data.

By Katie Rowan

November 3, 2025

3 min read

AI's Hidden Bias: Math Models Struggle with Cultural Nuances

Key Facts

Mathematics problems, often seen as culturally neutral, carry implicit cultural context.
Existing AI benchmarks like GSM8K are predominantly rooted in Western norms (names, currencies, scenarios).
Researchers created culturally adapted variants of GSM8K for Africa, India, China, Korea, and Japan.
Six large language models (8B-72B parameters) were evaluated across five prompting strategies.
Models consistently performed best on the original US-centric dataset and worse on culturally adapted versions.

Why You Care

Ever wonder if your AI assistant truly understands you, or just a specific version of you? What if the very foundations of AI, like mathematics, aren’t as universal as we think? New research suggests that large language models (LLMs) are tripping over cultural differences, especially when solving math problems. This impacts how effectively AI can serve a global audience, directly affecting your everyday interactions with these tools.

What Actually Happened

A team of researchers explored how cultural context in math problems affects AI performance. They created culturally adapted versions of the GSM8K benchmark, a standard dataset for math word problems. These new datasets represented five distinct regions: Africa, India, China, Korea, and Japan. The original GSM8K is largely rooted in Western norms, according to the announcement. This includes common names, currency examples, and daily scenarios. The team then evaluated six different large language models, ranging from 8B to 72B parameters. They used five prompting strategies to test the models’ robustness to these cultural variations. The goal was to see if AI could handle math when the cultural window dressing changed.

Why This Matters to You

This study reveals a significant challenge for the widespread adoption of AI. Imagine you’re using an AI tutor in India, and it struggles with a math problem involving rupees or local customs. This is precisely the issue highlighted by the research. The models consistently performed best on the original US-centric dataset. They showed comparatively worse results on the culturally adapted versions. This suggests a built-in bias within current AI training data.

Performance Gap on Culturally Adapted Math Problems:

Original US-centric data: Best performance
Culturally adapted versions (Africa, India, China, Korea, Japan): Comparatively worse performance

How might this affect your trust in AI tools if they can’t handle diverse contexts? “Although mathematics is often considered culturally neutral, the way mathematical problems are presented can carry implicit cultural context,” the paper states. This means that even seemingly objective tasks are influenced by cultural framing. For example, a problem about buying apples might make sense in one currency but confuse an AI trained primarily on another. This highlights a crucial need for more culturally inclusive AI creation.

The Surprising Finding

Here’s the twist: while a performance gap exists, not all models struggled equally. The research found that models with stronger reasoning capabilities were more resilient to these cultural shifts. This suggests that deeper reasoning helps bridge cultural presentation gaps in mathematical tasks, according to the announcement. It’s not just about memorizing facts; it’s about understanding the underlying logic. This challenges the common assumption that math is universally understood by AI regardless of presentation. It implies that models capable of more abstract thought can better adapt to varied cultural contexts. This is a hopeful sign for future AI creation.

What Happens Next

This research points to a clear path forward for AI developers. We can expect to see more efforts to diversify training data over the next 12-18 months. Future AI models will likely incorporate broader cultural contexts to improve their global applicability. For example, AI companies might develop datasets with problems featuring diverse names, currencies, and social scenarios from around the world. As a user, you should look for AI products that emphasize cultural inclusivity in their design. The industry must move beyond Western-centric data to create truly intelligent and adaptable AI. This will ensure that AI tools are fair and effective for everyone, everywhere. The team revealed that their work aims to improve AI’s understanding of global contexts.

Ready to start creating?