AutoCode: AI Creates Pro-Level Coding Challenges

A new system uses large language models to generate and validate complex competitive programming problems.

A research paper introduces AutoCode, an AI system that leverages large language models (LLMs) to create high-quality competitive programming problems. This system includes multi-round validation and can generate novel problem variants, achieving near-perfect consistency with human judgments.

By Katie Rowan

October 16, 2025

4 min read

AutoCode: AI Creates Pro-Level Coding Challenges

Key Facts

AutoCode uses large language models (LLMs) to generate competitive programming problems.
The system employs multiple rounds of validation to ensure problem quality and consistency.
AutoCode test suites achieve nearly 99% consistency with official human judgments, significantly outperforming previous methods.
It can generate novel problem variants from a random seed problem.
Grandmaster-level competitive programmers judged AutoCode's generated problems to be of contest quality.

Why You Care

Ever struggled to find truly challenging and fair coding problems to test your skills? Or perhaps you’re a competitive programmer always on the hunt for fresh, well-designed contests. What if an AI could create these intricate challenges for you, ensuring they are tough, fair, and free of loopholes? This is no longer a futuristic concept, as a new system called AutoCode is proving that large language models (LLMs) can indeed become expert problem setters.

What Actually Happened

A recent research paper, titled “AutoCode: LLMs as Problem Setters for Competitive Programming,” details a significant advancement in AI’s ability to generate complex coding challenges. The team behind AutoCode has developed a system that uses large language models to craft problems suitable for competitive programming. As mentioned in the release, these problems are often difficult to write, requiring authors to consider constraints, input distributions, and edge cases carefully. The research shows that AutoCode employs multiple rounds of validation to produce “competition-grade problem statements and test cases.” This rigorous approach ensures the quality and fairness of the generated problems.

What’s more, the system can even start with a random initial problem and create entirely new variants. The technical report explains that AutoCode cross-verifies these generated solutions against test cases. This process helps filter out any poorly formed problems before they reach human users. The team revealed that this method ensures high correctness, which human experts then verify.

Why This Matters to You

Imagine you’re a coding instructor. You constantly need fresh, relevant problems for your students. AutoCode could be your personal problem-generating assistant, freeing up your time to focus on teaching. For example, instead of spending hours crafting a dynamic programming problem, you could simply ask AutoCode to generate one with specific parameters. This means more diverse and high-quality practice for your students.

How much time could you save if problem generation became automated and reliable?

This system has direct implications for anyone involved in competitive programming or software creation education. The company reports that AutoCode test suites achieve 99% consistency with official judgments. This is a significant leap forward compared to current methods like HardTests, which achieve less than 81% consistency, as the study finds. This high consistency means you can trust the problems generated by AutoCode to be fair and accurate.

Here’s a quick comparison of problem-setting approaches:

Feature	Traditional Human Setter	HardTests (Previous AI)	AutoCode (New AI)
Constraint Setting	Manual	Limited	Automated
Edge Case Handling	Manual	Limited	Automated
Consistency with Judgments	Manual (High)	< 81%	~99%
Novelty	Manual	Limited	High

As detailed in the blog post, AutoCode successfully produces novel problems. These problems were judged by Grandmaster-level competitive programmers (the top 0.3% of competitors) to be of contest quality. This means the AI isn’t just creating simple exercises; it’s generating challenges that even elite coders find engaging and appropriate for high-stakes competitions.

The Surprising Finding

Here’s the twist: it turns out that creating competitive programming problems is an “ideal test of general large language model capabilities,” according to the announcement. You might assume LLMs are best for generating text or code, but their ability to act as problem setters reveals a deeper understanding. The research shows that the exacting nature of problem creation—setting constraints, defining input distributions, and considering edge cases—pushes LLMs to demonstrate a profound level of logical reasoning and foresight. It challenges the common assumption that complex, multi-faceted problem design is exclusively a human domain. The fact that AutoCode’s test suites approach 99% consistency with official judgments is particularly unexpected, given the intricate nature of competitive programming problems. This high level of accuracy highlights an unforeseen capability in current LLMs.

What Happens Next

We can expect to see AutoCode, or similar AI problem-setting tools, integrated into competitive programming platforms within the next 12-18 months. Imagine platforms like LeetCode or HackerRank offering an “AI-generated challenge” section. For example, a new feature might allow you to select your desired difficulty and specific algorithms (like dynamic programming or graph theory), and AutoCode would instantly generate a unique problem. This could lead to an explosion of personalized learning experiences.

For competitive programmers, this means an endless supply of fresh, high-quality practice problems. For educators, it offers a tool to create diverse assignments and assessments. The industry implications are vast, potentially democratizing access to high-quality programming education and training. The paper states that this creation could “further filter out malformed problems,” improving overall contest quality. So, get ready; your next coding challenge might just be crafted by an AI.

Ready to start creating?