Unlocking LLM Memory: The Role of Function Tokens Revealed

New research explains how punctuation and small words drive AI's knowledge retrieval and learning.

A recent paper introduces the 'function token hypothesis,' shedding light on how Large Language Models (LLMs) store and access information. It suggests that seemingly minor words like 'the' and commas are crucial for memory retrieval and consolidation, impacting how these AI models learn and reason.

By Katie Rowan

October 10, 2025

4 min read

Unlocking LLM Memory: The Role of Function Tokens Revealed

Key Facts

The 'function token hypothesis' explains LLM memory retrieval and consolidation.
Function tokens include punctuation, articles, prepositions, and conjunctions.
During inference, function tokens activate predictive features for next token prediction (memory retrieval).
During pre-training, predicting content tokens after function tokens updates model parameters (memory consolidation).
A small number of function tokens activate the majority of features in LLMs.

Why You Care

Ever wonder how a Large Language Model (LLM) remembers so much? How does it pull specific facts from its vast knowledge base? This isn’t just academic curiosity; understanding this helps us build better, more reliable AI. Your ability to interact effectively with AI, and even predict its responses, hinges on grasping these underlying mechanisms.

What Actually Happened

New research from Shaohua Zhang, Yuan Lin, and Hang Li introduces a compelling explanation for how Large Language Models (LLMs) manage their internal knowledge. They propose the ‘function token hypothesis,’ according to the announcement. This hypothesis suggests that ‘function tokens’ are key to both memory retrieval and consolidation within LLMs. Function tokens are linguistic elements like punctuation marks, articles (e.g., ‘a’, ‘the’), prepositions (e.g., ‘in’, ‘on’), and conjunctions (e.g., ‘and’, ‘but’). These are distinct from ‘content tokens,’ which carry primary meaning. During inference, the research shows, function tokens activate the most predictive features from context. This action then guides the prediction of the next token, which is essentially memory retrieval. Meanwhile, during pre-training, predicting the content tokens that follow function tokens helps LLMs learn more features. This process also updates the model’s parameters, leading to memory consolidation, as detailed in the blog post.

Why This Matters to You

Understanding how LLMs ‘think’ about information can profoundly impact how you interact with them. Imagine you’re asking an AI a complex question. Knowing that specific words act as internal navigators helps you craft better prompts. This new perspective reveals that the small, often overlooked words are doing heavy lifting behind the scenes.

For example, if you’re writing a prompt, the way you structure your sentences, including your use of prepositions and conjunctions, might subtly influence the AI’s retrieval process. This isn’t just about getting a better answer; it’s about understanding the AI’s internal logic. The study finds that ‘a small number of function tokens activate the majority of features.’ This highlights their disproportionate influence.

Key Roles of Function Tokens

Function Token Type	Linguistic Examples	LLM Role (Inference)
Punctuation	.,!?;	Context segmentation, intent signaling
Articles	a, an, the	Noun phrase identification, specificity
Prepositions	in, on, at, with	Relational mapping, spatial/temporal context
Conjunctions	and, but, or	Connecting ideas, logical flow

How might your own prompt engineering change if you consciously considered the power of these ‘little words’? This insight could help you troubleshoot why an LLM gives a certain response. It also provides a structure for thinking about how LLMs learn from the vast data they consume. The team revealed how function tokens activate the most predictive features. This directs the next token prediction. This is crucial for accurate and coherent responses. Your understanding of AI’s internal workings just got a significant upgrade.

The Surprising Finding

Here’s the twist: it’s not just the big, meaningful words that drive LLM intelligence. The research presents compelling evidence that seemingly minor ‘function tokens’ are incredibly . This challenges the common assumption that content words alone are the primary drivers of an LLM’s understanding. For instance, a small number of function tokens activate the majority of features, according to the paper. This means words like ‘the’ or a comma are not just structural; they are active agents in how an LLM processes information. The study further indicates that during pre-training, the training loss is dominated by predicting the next content tokens following function tokens. This forces the function tokens to select the most predictive features from context. This is surprising because it suggests that the AI’s learning process heavily relies on these grammatical connectors. It’s like finding out the glue is more important than the bricks in building a house. It reshapes our understanding of how these complex models learn and retrieve information.

What Happens Next

This research opens new avenues for improving Large Language Models. Developers might start designing LLMs that are even more sensitive to function tokens. We could see advancements in model training within the next 12-18 months. For example, future LLMs might be trained with an enhanced focus on how these tokens influence knowledge graphs. This could lead to more nuanced and context-aware AI responses. For you, this means potentially more accurate search results from AI-powered tools. It also means more coherent and logical outputs from generative AI. Actionable advice for users includes paying closer attention to the grammatical structure of your prompts. This might refine the AI’s ability to retrieve precise information. The industry implications are significant, potentially leading to more efficient pre-training methods. This could result in smaller, yet more , LLMs. The documentation indicates that understanding these mechanisms is vital. It will certainly shape the next generation of AI creation and your interaction with it.

Ready to start creating?