LLMs' Personality Illusion: Self-Reports vs. Behavior

New research reveals a significant disconnect between what large language models say about themselves and how they actually act.

A recent paper titled 'The Personality Illusion' from arXiv highlights a critical finding: large language models (LLMs) often exhibit a dissociation between their self-reported personality traits and their actual conversational behavior. This suggests that simply asking an LLM about its 'personality' might not reflect its true operational characteristics.

By Sarah Kline

September 7, 2025

4 min read

LLMs' Personality Illusion: Self-Reports vs. Behavior

Key Facts

A new paper titled 'The Personality Illusion' examines LLM behavior.
The research reveals a dissociation between LLM self-reported traits and actual behavior.
Authors include Pengrui Han, Rafal Kocielnik, Peiyang Song, Ramit Debnath, Dean Mobbs, Anima Anandkumar, and R. Michael Alvarez.
All code and source data for the study have been made public.
The paper highlights the importance of observing LLM behavior over self-declarations.

Why You Care

Ever wondered if your favorite AI chatbot truly ‘knows’ itself? Can an artificial intelligence (AI) describe its own personality accurately? New research suggests a surprising answer, and it could change how you interact with AI. This finding has major implications for anyone relying on large language models (LLMs) for essential tasks or even casual conversation. What if the AI you’re talking to isn’t what it claims to be?

What Actually Happened

A new paper, ‘The Personality Illusion: Revealing Dissociation Between Self-Reports & Behavior in LLMs,’ has been submitted to arXiv, as mentioned in the release. This research, authored by Pengrui Han and six other collaborators, delves into the fascinating area of AI personality. The study finds a significant disconnect. Specifically, it reveals a dissociation between how LLMs describe their own personality traits and their actual behavioral patterns during interactions. This means that an LLM’s ‘self-report’ about its characteristics might not align with its observable actions, according to the announcement. The paper focuses on understanding how these AI systems present themselves versus how they truly operate. The researchers made all their code and source data public, the team revealed.

Why This Matters to You

This finding is crucial for anyone working with or deploying large language models. If an LLM claims to be ‘helpful’ or ‘unbiased,’ but its behavior suggests otherwise, it creates a serious trust issue. Imagine using an AI for customer service that self-reports as empathetic, but consistently provides cold, unfeeling responses. This dissociation can lead to unexpected and potentially problematic outcomes in real-world applications. Your expectations of an AI’s performance could be entirely misaligned with its actual output.

Consider the practical implications:

AI Safety: If an LLM reports being ‘safe’ but acts in ways that could be harmful.
AI Alignment: Ensuring the AI’s stated goals match its operational behavior.
User Experience: Misleading self-descriptions can frustrate users.

The research shows that simply prompting an LLM about its personality might not give you an accurate picture. Instead, observing its behavior over time is essential. How will you assess the true nature of the AI tools you use moving forward?

As detailed in the blog post, the authors, including Pengrui Han, state that they “make public all code and source data.” This commitment to transparency allows others to verify and build upon their findings. It emphasizes the importance of empirical observation over self-declarations when evaluating AI systems. This is particularly relevant for developers and researchers building the next generation of AI applications.

The Surprising Finding

The most surprising element of this research is the stark contrast it highlights. We often assume that an AI, especially one capable of complex language, would be consistent. That is, its verbal self-description would match its functional output. However, the study finds this is not the case for LLMs. The paper states that there is a “dissociation between self-reports & behavior in LLMs.” This challenges a common assumption: that an AI’s linguistic output directly reflects its internal state or operational design. It implies that an LLM’s ‘personality’ is more of an illusion, a linguistic construct, rather than an inherent characteristic guiding its actions. For example, an LLM might say it is ‘friendly’ but then respond to user queries in a very direct or even curt manner. This reveals a gap between what the model can articulate about itself and its actual performance. It suggests that simply asking an LLM about its ‘personality’ might be a misleading approach to understanding its true operational characteristics.

What Happens Next

This research has significant implications for how we develop and evaluate large language models. Moving forward, developers and researchers will need to focus more on behavioral testing rather than relying on an LLM’s self-descriptions. We might see new evaluation frameworks emerge in the coming months, perhaps by early 2026, that prioritize observational data. For example, instead of asking an AI if it’s ‘creative,’ we would assess its creativity by analyzing the novelty and originality of its generated content. This shift will help ensure that AI systems are not just ‘saying’ the right things, but ‘doing’ the right things. Your approach to selecting and deploying AI tools should now include a deeper look at their actual performance. The industry will likely see a push for more behavioral benchmarks to truly understand AI capabilities and limitations, as the technical report explains. This will lead to more reliable and trustworthy AI applications.

Ready to start creating?