JP Morgan Chase Eliminates Backlog with Multi-Agent AI

A new framework, MAFA, uses collaborative AI agents to streamline enterprise annotation tasks.

JP Morgan Chase has successfully deployed MAFA, a Multi-Agent Framework for Annotation, to tackle massive data backlogs. This system uses AI agents to categorize customer interactions, significantly reducing manual work and improving accuracy. It marks a step forward in applying multi-agent AI to real-world business challenges.

By Katie Rowan

October 19, 2025

4 min read

JP Morgan Chase Eliminates Backlog with Multi-Agent AI

Key Facts

MAFA (Multi-Agent Framework for Annotation) is a production-deployed AI system.
It was deployed at JP Morgan Chase to eliminate a 1 million utterance backlog.
MAFA achieves 86% agreement with human annotators on average.
The system annually saves over 5,000 hours of manual annotation work.
It supports dynamic task adaptation through configuration, not code changes.

Why You Care

Ever feel buried under a mountain of tasks that never seems to shrink? What if artificial intelligence could not only help, but also learn to adapt to your specific needs without constant reprogramming? This is precisely what a new multi-agent AI structure, MAFA, has achieved. It’s transforming how large enterprises handle vast amounts of data, and it could impact how you interact with customer service.

What Actually Happened

A new paper introduces MAFA (Multi-Agent structure for Annotation), a system already in production that streamlines enterprise annotation workflows. This structure uses configurable multi-agent collaboration, according to the announcement. It addresses the significant challenge of annotation backlogs, especially in financial services. These backlogs involve millions of customer utterances—things like calls, chats, or emails—that need accurate categorization. MAFA combines specialized agents with structured reasoning and a judge-based consensus mechanism. The system supports dynamic task adaptation, allowing organizations to define custom annotation types through configuration, not code changes, as detailed in the blog post.

Why This Matters to You

MAFA’s deployment at JP Morgan Chase offers a clear example of its impact. The company reports it has eliminated a one million utterance backlog. What’s more, it achieves an average of 86% agreement with human annotators. This translates to significant time savings. Imagine your own customer service interactions becoming faster and more accurate because the underlying data is processed so efficiently. How would that improve your experience?

Here’s a look at MAFA’s performance metrics:

Backlog Elimination: 1 million customer utterances
Human Agreement: 86% average
Annual Savings: Over 5,000 hours of manual annotation
Top-1 Accuracy betterment: 13.8% higher
Top-5 Accuracy betterment: 15.1% higher
F1 Score betterment: 16.9% better

This system processes utterances with annotation confidence classifications. Typically, these are 85% high, 10% medium, and 5% low across all datasets, the team revealed. This allows human annotators to focus exclusively on ambiguous and low-coverage cases. Mahmood Hegazy, one of the authors, stated, “Our structure uniquely supports dynamic task adaptation, allowing organizations to define custom annotation types (FAQs, intents, entities, or domain-specific categories) through configuration rather than code changes.” This means the system can be tailored to various business needs without extensive technical overhaul.

The Surprising Finding

The most surprising aspect of MAFA’s success lies in its ability to achieve high accuracy while drastically reducing human effort. While many might expect AI to struggle with the nuances of human language, MAFA boasts an 86% agreement rate with human annotators. This challenges the common assumption that complex, subjective tasks like categorizing customer intent require almost human intervention. The structure’s multi-agent approach, with specialized agents and a consensus mechanism, allows it to handle ambiguity effectively. It consistently improves over traditional and single-agent annotation baselines, the research shows. This includes a 13.8% higher Top-1 accuracy and a 15.1% betterment in Top-5 accuracy in internal intent classification datasets. These gains extend to public benchmarks as well. It suggests that collaborative AI systems can outperform simpler AI models and even human-only processes in specific, high-volume tasks.

What Happens Next

The success of MAFA at JP Morgan Chase provides a clear blueprint for other organizations. We can expect to see similar multi-agent frameworks adopted across various industries in the next 12-24 months. For example, a large insurance company could use MAFA to rapidly process claims documents, identifying key information and flagging complex cases for human review. This would significantly speed up processing times. The technical report explains that this work bridges the gap between theoretical multi-agent systems and practical enterprise deployment. For you, this means potentially faster service and more accurate information from companies you interact with. Businesses should consider how configurable AI systems could address their own data backlogs and improve operational efficiency. The paper states that MAFA will be presented at AAAI 2026 Applications of AI, indicating further validation and wider industry recognition are on the horizon.

Ready to start creating?