ACING Boosts Black-Box LLM Performance Automatically

New AI framework optimizes instructions for large language models without needing internal access.

A new AI framework called ACING significantly improves how Large Language Models (LLMs) understand and complete tasks. It automates the process of creating better instructions, even for 'black-box' LLMs where internal workings are hidden. This innovation could make LLMs much more effective and easier to use.

By Mark Ellison

September 7, 2025

4 min read

ACING Boosts Black-Box LLM Performance Automatically

Key Facts

ACING is an actor-critic reinforcement learning framework for optimizing LLM instructions.
It works with 'black-box' LLMs, which lack accessible internal parameters.
ACING automatically discovers prompts that outperform human-written prompts in 76% of tasks.
It achieved up to 33 points and a 10-point median improvement over baselines.
The framework was accepted at EMNLP 2025.

Why You Care

Ever struggled to get an AI to do exactly what you want? Crafting the prompt for a Large Language Model (LLM) can feel like an art, not a science. What if an AI could write better prompts than you, automatically? A new structure, ACING, promises to do just that, potentially making your interactions with AI far more effective and less frustrating.

What Actually Happened

Researchers Salma Kharrat, Fares Fourati, and Marco Canini introduced ACING (Actor-Critic for Instruction Learning in Black-Box LLMs). This novel structure uses reinforcement learning to improve instructions for LLMs, according to the announcement. The core challenge they addressed involves ‘black-box’ LLMs. These are models where the internal parameters and gradients—essentially, how the AI thinks and learns—are not accessible. This makes traditional optimization methods impossible.

ACING formulates instruction optimization as a stateless, continuous-action problem. This approach allows the system to explore an infinite range of instruction possibilities. It relies solely on black-box feedback, meaning it learns from the LLM’s responses without seeing its internal code. The goal is to automate the often-laborious process of crafting effective prompts, which currently requires substantial human effort, the paper states.

Why This Matters to You

This creation has significant implications for anyone using or developing with LLMs. Imagine you’re a content creator. Instead of spending hours refining prompts for a summarization task, ACING could generate an optimal prompt for you. This saves time and improves output quality.

For example, if you’re building an AI assistant, ACING could help it understand complex user requests more accurately. The research shows that ACING automatically discovers prompts that outperform human-written prompts in 76% of instruction-induction tasks. This is a remarkable success rate. What’s more, it achieved gains of up to 33 points and a 10-point median betterment over the best automatic baseline across 33 diverse tasks.

This includes tasks like instruction-induction, summarization, and chain-of-thought reasoning. Think of it as having an expert prompt engineer working tirelessly for you. How much more efficient could your AI workflows become with this kind of automated optimization?

Here’s a look at ACING’s performance improvements:

Task Type	ACING vs. Human Prompts	Median betterment	Max betterment
Instruction-Induction	Outperforms in 76%	10 points	33 points
Summarization	Significant Gains	Not specified	Not specified
Chain-of-Thought	Significant Gains	Not specified	Not specified

Extensive ablations, which are experiments designed to test the robustness of a system by removing components, highlight ACING’s efficiency and stability, the team revealed.

The Surprising Finding

The most surprising finding from this research is ACING’s ability to consistently outperform human-written prompts. You might assume that a human, with their nuanced understanding of language, would always craft the best instructions. However, the study finds that ACING surpasses human efforts in a significant majority of cases.

Specifically, ACING outperforms human-written prompts in 76% of instruction-induction tasks. This challenges the common assumption that human intuition is always superior in prompt engineering. It suggests that a systematic, reinforcement learning approach can discover optimal prompt structures that are not immediately obvious to human users. This indicates a new direction for AI creation, where AI can improve its own interaction methods.

What Happens Next

The implementation of ACING is already available, meaning developers and researchers can begin experimenting with it now. Its acceptance at EMNLP 2025, a major conference in natural language processing, suggests its impact will be discussed widely within the AI community. We can expect to see more integration of such automated prompt optimization tools into commercial LLM platforms within the next 12-18 months.

For example, imagine future versions of AI writing tools or chatbots automatically fine-tuning their internal prompts to better understand your commands. This could lead to more intuitive and AI applications across various industries. If you work with LLMs, exploring ACING’s approach could be a key step in future-proofing your skills. The industry implications are clear: automated instruction optimization will become a standard feature, making LLMs more accessible and effective for everyone.

Ready to start creating?