LLMs Struggle with Security: What You Need to Know

New research reveals common errors and limitations in how large language models answer end-user security questions.

A recent study evaluated popular large language models (LLMs) like GPT, LLAMA, and Gemini on 900 end-user security questions. The findings indicate that while LLMs possess broad knowledge, they often provide stale, inaccurate, or poorly communicated security advice. This highlights critical areas for model improvement and suggests user strategies for better interaction.

By Katie Rowan

October 30, 2025

4 min read

LLMs Struggle with Security: What You Need to Know

Key Facts

Three popular LLMs (GPT, LLAMA, Gemini) were evaluated.
900 end-user security questions were used in the study.
LLMs demonstrated broad generalist knowledge but exhibited error patterns.
Errors included stale, inaccurate answers, and poor communication styles.
The research suggests directions for model improvement and user strategies.

Why You Care

Ever asked an AI chatbot for advice on a tricky security issue? Did you trust its answer completely?

New research from Vijay Prakash and his team sheds light on how large language models (LLMs) like GPT, LLAMA, and Gemini perform when tackling your security questions. This study reveals crucial limitations. Understanding these findings is vital for anyone relying on AI for personal or professional security guidance. Your digital safety could depend on it.

What Actually Happened

Researchers at arXiv conducted a qualitative evaluation of three popular large language models (LLMs). According to the announcement, they these models on a substantial dataset of 900 systematically collected end-user security questions. The goal was to understand the LLMs’ performance in a essential area: providing security advice to everyday users. These models, including well-known names like GPT, LLAMA, and Gemini, are generally . However, the study specifically focused on their accuracy and reliability when dealing with security-related inquiries. The team aimed to identify patterns of errors and limitations. This comprehensive approach provides a clear picture of current AI capabilities in this sensitive domain.

Why This Matters to You

While LLMs show a broad generalist knowledge in security, the research shows significant patterns of errors. These issues include outdated and incorrect answers, along with indirect or unresponsive communication styles. This directly impacts the quality of information you receive. Imagine you’re trying to secure your home network. If an LLM gives you outdated advice on firewall settings, your network could remain vulnerable. This is why understanding these limitations is so important for your online safety. What steps do you take to verify information from an AI chatbot?

Common LLM Security Response Issues:

Stale Information: Advice based on old data or outdated practices.
Inaccurate Answers: Incorrect technical details or misleading instructions.
Indirect Communication: Responses that are vague or don’t directly address the question.
Unresponsive Styles: Language that is unhelpful or difficult to follow.

As detailed in the blog post, these patterns highlight a significant challenge. “While LLMs demonstrate broad generalist ‘knowledge’ of end user security information, there are patterns of errors and limitations across LLMs consisting of stale and inaccurate answers, and indirect or unresponsive communication styles, all of which impacts the quality of information received,” the paper states. This means you cannot blindly trust every piece of security advice from an LLM. Always cross-reference crucial security information.

The Surprising Finding

Here’s the twist: despite their vast general knowledge, LLMs consistently struggle with the nuances of end-user security. The study finds that while they can access a wide range of security information, their responses often fall short. This is surprising because LLMs excel in many other complex question-answering tasks. However, the research reveals that LLMs frequently provide stale and inaccurate answers. They also exhibit indirect or unresponsive communication styles, which can be frustrating for users seeking clear guidance. This challenges the common assumption that an AI with access to the internet can provide , up-to-the-minute security solutions. It shows that context and currency are essential, and current LLMs often miss the mark.

What Happens Next

Based on these findings, the researchers suggest important directions for model betterment. They also recommend user strategies for interacting with LLMs when seeking security assistance. For example, future LLM updates, potentially in the next 6-12 months, could focus on real-time data integration and improved contextual understanding for security queries. Imagine an LLM that not only answers your question but also prompts you to check the date of the information it provides. This would empower you to make more informed decisions.

For you, the actionable takeaway is clear: treat LLM security advice with caution. Always verify essential information from reputable, human-curated sources. The industry implications are significant, pushing developers to build more and context-aware large language models specifically for security applications. The team revealed, “Based on these patterns, we suggest directions for model betterment and recommend user strategies for interacting with LLMs when seeking assistance with security.” This collaborative approach between research and user awareness will be key to enhancing digital safety.

Ready to start creating?