MLLMs Lag in Face Recognition: What It Means for AI Security

New research benchmarks Multimodal Large Language Models, revealing their surprising limitations in precise identification.

A recent study evaluates Multimodal Large Language Models (MLLMs) for face recognition, finding they underperform compared to specialized models in high-precision tasks. This research provides crucial insights for developing more accurate and generalized MLLM-based face recognition systems. The findings highlight areas for future development in AI security.

By Sarah Kline

October 19, 2025

4 min read

MLLMs Lag in Face Recognition: What It Means for AI Security

Key Facts

A new benchmark evaluates Multimodal Large Language Models (MLLMs) for face recognition.
MLLMs perform well in understanding semantic cues but lag in high-precision face recognition.
Specialized face recognition models currently outperform MLLMs in zero-shot applications.
The study used datasets including LFW, CALFW, CPLFW, CFP, AgeDB, and RFW.
The source code for the benchmark is publicly available for further research.

Why You Care

Ever wonder if the AI powering your phone’s face unlock is as smart as it seems? A new study reveals something surprising about AI. It shows that Multimodal Large Language Models (MLLMs) aren’t quite ready for prime time in precise face recognition. Why should this matter to you? Your digital security and privacy often rely on these very technologies.

This research, presented by Hatef Otroshi Shahreza and Sébastien Marcel, dives deep into MLLMs. It explores their capabilities and shortcomings. Understanding these limitations is vital for anyone relying on AI for identification.

What Actually Happened

Researchers Hatef Otroshi Shahreza and Sébastien Marcel recently published a systematic benchmark. This benchmark evaluates Multimodal Large Language Models (MLLMs) for face recognition. According to the announcement, MLLMs combine vision and language processing. They have shown impressive performance across many tasks. However, their specific application in face recognition has been largely unexplored until now.

The study compared open-source MLLMs against existing, specialized face recognition models. This comparison used standard benchmarks. Datasets included LFW, CALFW, CPLFW, CFP, AgeDB, and RFW. The goal was to understand how MLLMs perform in real-world scenarios. The team revealed that while MLLMs grasp rich semantic cues, they fall short in high-precision recognition. This is particularly true in zero-shot applications—where the model sees new faces it hasn’t been trained on.

Why This Matters to You

This finding has practical implications for everyday system. Think about the facial authentication used in your banking apps or at airport gates. If MLLMs are less accurate, it could impact security. The research provides a foundation for improving MLLM-based face recognition. It offers crucial insights for designing models. These models need higher accuracy and better generalization.

Consider this: how confident are you in the security of your biometric data? This study suggests that current MLLMs might not be the best choice for essential identification tasks. “While MLLMs capture rich semantic cues useful for face-related tasks, they lag behind specialized models in high-precision recognition scenarios in zero-shot applications,” the paper states. This means a dedicated face recognition AI often outperforms a more general MLLM. Your security could depend on which type of AI is in use.

Here’s a breakdown of the MLLM performance:

Aspect	MLLM Performance
Semantic Understanding	Strong in capturing rich semantic cues
High-Precision Recognition	Lags behind specialized models
Zero-Shot Applications	Struggles with unseen faces
Generalization	Needs betterment for broader real-world use

For example, imagine you are trying to unlock your phone. A specialized face recognition system is highly tuned for that single task. An MLLM, however, might be good at describing an image and identifying a face. But it might not be as precise for just face identification. This highlights a trade-off between versatility and specialized accuracy. What implications does this have for your personal data security?

The Surprising Finding

Here’s the twist: despite their broad capabilities, MLLMs are not yet superior for precise face recognition. You might assume that a model capable of understanding both images and text would excel at face identification. However, the study finds this is not the case. The research shows that MLLMs, while adept at understanding various visual and linguistic elements, struggle with the fine details required for accurate face matching. This challenges the common assumption that more versatile AI models automatically lead to better performance in specific, high-stakes tasks. “Experimental results reveal that while MLLMs capture rich semantic cues useful for face-related tasks, they lag behind specialized models in high-precision recognition scenarios.” This means that for tasks demanding extreme accuracy, specialized AI still holds the advantage. It’s surprising because MLLMs are often seen as the future of AI. Yet, they have a clear limitation in this essential area.

What Happens Next

This research provides a clear roadmap for future creation. The source code for their benchmark is publicly available. This will allow other researchers to build upon their findings. We can expect to see advancements in MLLM-based face recognition over the next 12-18 months. Developers will likely focus on improving MLLM precision. They will also work on their generalization capabilities. For example, future MLLMs might incorporate specialized modules for facial features. This could boost their accuracy significantly. The industry implications are substantial. Companies relying on facial recognition for security or authentication will need to consider these findings. They might continue to use specialized models for essential applications. Meanwhile, MLLMs could find use in less sensitive areas. For you, this means potentially more and secure AI systems in the long run. The team revealed their work offers “insights for the design of models with higher accuracy and generalization.” This suggests a focused effort to bridge the current performance gap.

Ready to start creating?