New AI Detector Boosts Accuracy with Stylometric Features

Researchers enhance AI text detection by integrating linguistic style analysis and multilingual models.

A new study successfully replicates and extends an AI text detection system. By adding 26 document-level stylometric features and using newer multilingual models, the system shows improved performance. This development could help identify AI-generated content more effectively across different languages.

By Sarah Kline

March 17, 2026

3 min read

New AI Detector Boosts Accuracy with Stylometric Features

Key Facts

The study replicated and extended an AI text detection system from the AuTexTification 2023 shared task.
Researchers added 26 document-level stylometric features to improve detection performance.
Newer multilingual language models like Qwen and mGPT were used for probabilistic features.
The system uses a shared configuration (mDeBERTa-v3-base) for both English and Spanish.
Additional stylometric features improved performance in both tasks and languages.

Why You Care

Ever wondered if that email or article you just read was written by a human or an AI? With the rise of generative models, distinguishing between human and machine-generated text is becoming incredibly difficult. What if there was a better way to tell the difference, and what would that mean for your daily digital interactions?

What Actually Happened

Researchers Adam Skurla, Dominik Macko, and Jakub Simko have significantly AI text detection. They replicated and extended a system from the AuTexTification 2023 shared task, according to the announcement. Their goal was to improve the accuracy of identifying machine-generated texts. Initially, exact replication was challenging due to differences in data splits and model availability, the paper states. However, they pressed on, incorporating newer multilingual language models. They also added 26 document-level stylometric features. These features analyze writing style, like sentence length variation or common word choices. The team used models like Qwen and mGPT for probabilistic features, as detailed in the blog post. For contextual representations, they utilized mDeBERTa-v3-base. This allowed for a single configuration across both English and Spanish tasks.

Why This Matters to You

This research holds significant implications for anyone dealing with digital content. Imagine you’re a content creator or a student. Knowing if a piece of text is AI-generated can be crucial for authenticity and academic integrity. The study indicates that additional stylometric features notably improve performance in both tasks and languages. This means more reliable detection of AI-written content. What’s more, the multilingual configuration performs comparably to, or even better than, language-specific models. This simplifies the process for global applications. What impact could this have on your trust in online information?

Here’s a look at the key enhancements:

Feature Added	Impact on Detection
26 Stylometric Features	Improved performance in both tasks and languages
Newer Multilingual Models	Comparable or better results than language-specific
SHAP Analysis	Provides interpretability of model decisions

For example, if you’re a journalist, this system could help verify the origin of news articles. You could quickly identify if a story was crafted by an AI, potentially flagging misinformation. The team revealed that “the additional stylometric features improve performance in both tasks and both languages.” This is a crucial step towards more detection tools. Your ability to discern AI-generated content just got a significant boost.

The Surprising Finding

Perhaps the most surprising aspect of this research is the power of combining traditional linguistic analysis with modern AI. The study found that simply adding 26 document-level stylometric features significantly boosted detection accuracy. This challenges the assumption that only complex neural network architectures are needed for AI detection. It suggests that analyzing writing style, like average word length or punctuation use, remains highly effective. The team also used SHAP analysis to explain which features influenced the model’s decisions, as the technical report explains. This interpretability is often lacking in complex AI systems. It provides a clearer understanding of why a text is flagged as AI-generated.

What Happens Next

Looking ahead, we can expect to see these enhanced AI text detection methods integrated into various tools. Over the next 6-12 months, anticipate updates to plagiarism checkers and content authenticity platforms. For example, educational institutions might adopt these systems to combat AI-assisted cheating. Content platforms could use them to maintain editorial standards. The company reports that clear documentation is vital for reliable replication and fair comparison of systems. This emphasizes the need for transparency in future AI creation. Our actionable advice for you is to stay informed about these evolving detection capabilities. Understand how they work and consider their implications for your own content creation or consumption. This will undoubtedly shape how we interact with digital text moving forward.

Ready to start creating?