New Benchmark Reveals AI Agents Struggle with Data Tasks

DAComp exposes critical weaknesses in current AI for enterprise data intelligence workflows.

A new benchmark called DAComp has been introduced to test AI data agents across the full data intelligence lifecycle. It reveals that even state-of-the-art AI agents perform poorly, especially in data engineering tasks, highlighting significant limitations in their ability to handle complex enterprise data.

By Mark Ellison

December 18, 2025

3 min read

New Benchmark Reveals AI Agents Struggle with Data Tasks

Key Facts

DAComp is a new benchmark for AI data agents.
It includes 210 tasks covering data engineering and data analysis.
State-of-the-art AI agents achieve under 20% success on data engineering tasks.
AI agents score below 40% on data analysis tasks.
The benchmark uses an LLM-judge for open-ended task assessment.

Why You Care

Ever wonder if the AI tools you rely on can truly handle complex data? Can they turn messy raw data into actionable insights for your business? A new benchmark, DAComp, suggests current AI agents are falling short. This could impact how you integrate AI into your data workflows, potentially slowing down your business intelligence efforts.

What Actually Happened

Researchers have introduced DAComp, a new benchmark designed to evaluate AI data agents. This benchmark features 210 tasks that mirror real-world enterprise data intelligence workflows, according to the announcement. These workflows include both data engineering (DE) and data analysis (DA) tasks. Data engineering involves transforming raw data into analytical-ready tables. Data analysis then converts those tables into decision-oriented insights. The benchmark aims to provide a rigorous and realistic testbed for autonomous data agents, as detailed in the blog post. It assesses how well AI agents can design SQL pipelines or solve open-ended business problems.

Why This Matters to You

This new benchmark directly impacts anyone using or developing AI for data-driven decisions. It highlights significant gaps in current AI capabilities. Imagine your company needs to build a new data pipeline from scratch. Or perhaps you need an AI to interpret complex sales data. The study finds that current AI agents struggle with these very tasks. This means you might need to rethink your expectations for AI in data intelligence.

Key Findings from DAComp Benchmark:

Data Engineering (DE) Success Rate: Under 20%
Data Analysis (DA) Success Rate: Below 40%
Total Tasks: 210
Evaluation Method for Open-ended Tasks: LLM-judge guided by hierarchical rubrics

How much human oversight will your AI still require for essential data tasks? The research shows that even agents falter on DAComp. “Performance on DE tasks is particularly low, with success rates under 20%,” the team revealed. This exposes a essential bottleneck in holistic pipeline orchestration, not merely code generation. Understanding these limitations helps you plan more effectively for AI adoption.

The Surprising Finding

Here’s the twist: While many assume AI is getting better at everything, DAComp reveals a significant weakness. The most surprising finding is the extremely low success rate for data engineering tasks. With scores under 20%, it shows that AI agents are not just struggling with complex reasoning. They are failing at the fundamental orchestration of data pipelines. This challenges the common assumption that AI can easily handle structured data manipulation. The paper states that this exposes a essential bottleneck in holistic pipeline orchestration. It’s not simply about generating code; it’s about understanding the entire workflow.

What Happens Next

This benchmark sets a clear challenge for AI developers. Over the next 12-18 months, we can expect a focused effort to improve AI agents’ data engineering capabilities. For example, future AI models will likely incorporate better planning and execution frameworks for multi-stage SQL pipelines. Developers will need to address these profound deficiencies in open-ended reasoning. For you, this means future AI tools will hopefully be more for your data needs. “By clearly diagnosing these limitations, DAComp provides a rigorous and realistic testbed to drive the creation of truly capable autonomous data agents for enterprise settings,” the authors mentioned in the release. This will ultimately lead to more reliable AI solutions for your business intelligence. The industry implications are clear: a stronger emphasis on practical, full-lifecycle data intelligence for AI creation.

Ready to start creating?