New AI 'Attention Sinks' Found: What They Mean for Large Language Models

Researchers identify 'secondary attention sinks' in AI models, impacting how LLMs process information.

A new study reveals 'secondary attention sinks' in large language models, a different type of AI processing anomaly. These sinks appear in middle layers of neural networks, influencing how models understand and generate text. Understanding them could improve AI efficiency and performance.

By Katie Rowan

January 1, 2026

4 min read

New AI 'Attention Sinks' Found: What They Mean for Large Language Models

Key Facts

Researchers identified a new class of 'secondary attention sinks' in large language models (LLMs).
Unlike 'primary sinks' (often the BOS token), secondary sinks emerge primarily in middle layers of the neural network.
Secondary sinks persist for a variable number of layers and draw a smaller, but still significant, amount of attention mass.
These sinks are formed by specific middle-layer MLP modules within the AI.
The study involved extensive experiments across 11 model families to analyze sink properties and formation.

Why You Care

Ever wonder why your favorite AI chatbot sometimes seems to get stuck on a seemingly irrelevant detail, or perhaps struggles with complex requests? What if there’s a hidden mechanism inside these AI models that’s quietly influencing their thought process? New research has uncovered just such a mechanism, called ‘secondary attention sinks.’ Understanding these could change how we build and interact with AI, making your future AI experiences smoother and more reliable.

What Actually Happened

Researchers, including Jeffrey T.H. Wong, have identified a new class of ‘attention sinks’ in large language models (LLMs). These are tokens – individual units of text – that receive a disproportionately high amount of attention within the AI’s neural network, according to the announcement. Previously, scientists knew about ‘primary sinks,’ often linked to the beginning-of-sequence (BOS) token. These primary sinks typically emerge early in the network and persist throughout, drawing significant attention mass, the research shows. However, this new study focuses on ‘secondary sinks,’ which behave quite differently. The team revealed these secondary sinks primarily arise in the middle layers of the model. What’s more, they can persist for a variable number of layers, drawing a smaller but still significant amount of attention, as detailed in the blog post.

Why This Matters to You

These newly identified secondary attention sinks could be crucial for anyone working with or relying on AI. They represent a subtle but influence on how AI models process information and generate responses. Imagine you’re using an AI to summarize a long document. If a secondary sink incorrectly latches onto a minor detail in the middle of the text, your summary might become skewed. The study finds these sinks are formed by specific middle-layer MLP modules – multi-layer perceptrons, which are fundamental components of neural networks. These MLPs map token representations to vectors aligning with the primary sink’s direction, the paper states. This means specific parts of the AI are inadvertently creating these attention traps.

Here’s a quick look at the distinctions:

Feature	Primary Sinks	Secondary Sinks
Emergence Layer	Typically early (e.g., first layer)	Primarily in middle layers
Persistence	Throughout the network	Variable number of layers
Attention Mass	Very large amount	Smaller, but still significant
Formation	Often BOS token	Specific middle-layer MLP modules

How might this impact your daily use of AI? Think about debugging an AI application. If an AI is generating unexpected output, these secondary sinks could be a hidden cause. Do you ever wonder if the AI is truly understanding your complex prompts, or just getting distracted? This research provides a new lens through which to view these challenges. According to Jeffrey T.H. Wong, “We find the existence of secondary sinks that arise primarily in middle layers and can persist for a variable number of layers, and draw a smaller, but still significant, amount of attention mass.” This highlights their unique and potentially subtle influence.

The Surprising Finding

What’s truly surprising here is the existence of these distinct secondary sinks. Prior work had identified that tokens other than the BOS (beginning-of-sequence) token could sometimes become sinks. However, those were found to exhibit properties analogous to the BOS token, the study explains. They emerged at the same layer and persisted throughout the network, drawing a large amount of attention mass. The unexpected twist, however, is that these new secondary sinks behave differently. They don’t just mimic primary sinks. Instead, they emerge specifically in the middle layers of the model. This challenges the assumption that all non-BOS attention sinks would follow the same pattern as the primary ones. The research shows these secondary sinks are formed by specific middle-layer MLP modules, making their emergence more localized and dynamic than previously understood. This suggests a more complex internal dynamic within AI models than previously modeled.

What Happens Next

This discovery opens new avenues for improving AI models. Researchers will likely focus on understanding and potentially mitigating the effects of these secondary attention sinks in the coming quarters. For example, future AI model architectures might be designed to prevent the formation of these sinks or to redirect the attention mass more effectively. The team revealed they conducted extensive experiments across 11 model families, analyzing where these sinks appear, their properties, and how they are formed. This broad testing indicates a widespread phenomenon, not an isolated incident. For you, this could mean more efficient and predictable AI performance in the near future. Developers might gain new tools to fine-tune AI models, leading to less ‘hallucination’ or irrelevant outputs. The industry implications are significant, potentially leading to more and reliable large language models across various applications. This research offers a deeper insight into the complex internal workings of AI.

Ready to start creating?