New AI Benchmark Reframes Forecasting with 'What-If' Scenarios

Researchers introduce WIT, a benchmark designed to test AI's ability to use future scenarios for multimodal forecasting.

A new benchmark called What If TSF (WIT) has been developed to challenge traditional time series forecasting. It pushes AI models to incorporate textual 'what-if' scenarios, moving beyond simple historical data extrapolation. This could lead to more nuanced and human-like predictions.

By Mark Ellison

January 16, 2026

3 min read

New AI Benchmark Reframes Forecasting with 'What-If' Scenarios

Key Facts

What If TSF (WIT) is a new multimodal forecasting benchmark.
WIT evaluates AI models' ability to condition forecasts on contextual text and future scenarios.
Existing forecasting approaches are often unimodal, relying solely on historical data.
Human experts naturally incorporate 'what-if' scenarios into their predictions.
The benchmark uses expert-crafted plausible or counterfactual scenarios.

Why You Care

Ever wonder why AI sometimes misses the mark on future predictions, even with tons of data? Don’t you wish it could think like a human expert, considering different possibilities? A new creation in artificial intelligence is changing how we approach forecasting. Researchers are introducing a benchmark that could make AI predictions much smarter and more adaptable. This could directly impact your business decisions and daily life.

What Actually Happened

Researchers have unveiled a new benchmark called What If TSF (WIT), according to the announcement. This benchmark aims to reframe time series forecasting (TSF) by integrating scenario-guided multimodal forecasting. Traditional TSF often relies solely on historical patterns, making it unimodal—meaning it uses only one type of data. However, human experts incorporate ‘what-if’ scenarios and historical evidence into their predictions. This often leads to distinct forecasts from the same observations under different scenarios. The WIT benchmark provides a rigorous testbed for AI models. It evaluates their ability to condition forecasts on contextual text, especially future scenarios, as detailed in the blog post. This includes expert-crafted plausible or counterfactual situations.

Why This Matters to You

Imagine you’re a business owner planning inventory. Current AI might predict demand based on last year’s sales. But what if a competitor launches a new product? Or a new regulation changes your supply chain? The WIT benchmark helps AI consider these complex situations. It moves beyond simple extrapolation. This means AI could soon offer more and adaptable forecasts for your specific needs.

Key Differences in Forecasting Paradigms:

Traditional Unimodal Forecasting: Relies on extrapolating historical patterns. It often struggles with unexpected events.
Scenario-Guided Multimodal Forecasting: Integrates historical data with textual ‘what-if’ scenarios. It allows for more nuanced predictions.

This shift is crucial for decision-making. “Human experts incorporate what-if scenarios with historical evidence, often producing distinct forecasts from the same observations under different scenarios,” the paper states. This highlights the gap WIT aims to bridge. How much more confident would you be in a forecast that considers multiple potential futures? This approach could reduce risks and uncover new opportunities for your projects.

The Surprising Finding

Here’s the twist: while large language models (LLMs) show promise in multimodal forecasting, existing benchmarks often fall short. They provide retrospective or misaligned raw context, the research shows. This makes it unclear if LLMs truly use textual inputs effectively. The surprising finding is that despite advancements, current AI often doesn’t meaningfully use the rich textual information available. It struggles to integrate future scenarios into its predictions. This challenges the assumption that simply feeding text to an LLM automatically improves forecasting accuracy. It suggests a need for benchmarks like WIT to truly test this capability. The benchmark is designed to evaluate whether models can condition their forecasts on contextual text. This includes both plausible and counterfactual scenarios, the team revealed.

What Happens Next

This new benchmark, What If TSF, is already available. Developers and researchers can begin using it immediately. We can expect to see initial results and model improvements within the next 6-12 months. For example, financial institutions could use this to predict market movements under various geopolitical scenarios. Urban planners might forecast traffic patterns considering new infrastructure projects or extreme weather events. Your company could implement AI systems that factor in potential supply chain disruptions or sudden shifts in consumer behavior. The industry implications are significant. This pushes AI towards more human-like reasoning in forecasting. It encourages the creation of AI that understands context and future possibilities. This will lead to more resilient and intelligent decision-making systems.

Ready to start creating?