New AI Tool 'NER Retriever' Finds Entities Without Pre-set Categories

A new framework leverages LLM internal representations for zero-shot named entity retrieval, offering schema-free identification.

Researchers have introduced NER Retriever, a novel AI framework that identifies named entities in documents without needing predefined categories. It uses advanced Large Language Model (LLM) techniques to understand user-defined descriptions, making information retrieval more flexible and powerful.

Sarah Kline

By Sarah Kline

September 7, 2025

4 min read

New AI Tool 'NER Retriever' Finds Entities Without Pre-set Categories

Key Facts

  • NER Retriever is a zero-shot retrieval framework for named entities.
  • It uses user-defined type descriptions instead of fixed schemas.
  • The system embeds entity mentions and type descriptions into a shared semantic space.
  • It leverages internal representations (value vectors) from mid-layer transformer blocks of LLMs.
  • NER Retriever outperforms lexical and dense sentence-level retrieval baselines.

Why You Care

Ever wished you could find specific information in a mountain of text, even if you didn’t know exactly what you were looking for beforehand? Imagine searching for ” startups” without needing a list of company names. This is now becoming a reality. A new structure called NER Retriever is changing how we find named entities. It promises to make your data searches much smarter and more intuitive. How will this impact your daily digital life?

What Actually Happened

Researchers Or Shachar, Uri Katz, Yoav Goldberg, and Oren Glickman have unveiled a new AI structure. It’s called NER Retriever, according to the announcement. This structure is designed for zero-shot retrieval of named entities. Unlike traditional methods, it doesn’t require pre-set categories or extensive fine-tuning. Instead, it uses a user-defined type description to find relevant documents. The technical report explains that this approach is a variant of Named Entity Recognition (NER). NER is the process of identifying and classifying named entities (like people, organizations, or locations) in text. This new system builds on the internal representations of large language models (LLMs). These LLMs embed both entity mentions and open-ended type descriptions into a shared semantic space. This means the AI understands the meaning behind your search terms.

Why This Matters to You

This creation has significant practical implications for you. Think about how you currently search for information. You often rely on specific keywords or predefined tags. NER Retriever changes this by allowing more natural language queries. For example, imagine you’re a content creator looking for all mentions of “sustainable agriculture initiatives” in a large dataset. You don’t need a pre-existing list of those initiatives. You can simply describe what you’re looking for. The system then retrieves documents mentioning entities that fit your description. This flexibility saves you time and uncovers hidden connections. “We show that internal representations, specifically the value vectors from mid-layer transformer blocks, encode fine-grained type information more effectively than commonly used top-layer embeddings,” as detailed in the blog post. This highlights the unique way the system processes information. How might this new capability transform your research or content creation process?

Here’s a look at some key benefits:

BenefitDescription
FlexibilitySearch for entities without predefined categories.
EfficiencyQuicker retrieval of specific, contextually relevant information.
AccuracyLeverages LLM understanding for better semantic matching.
ScalabilityHandles large datasets for broad information discovery.

The Surprising Finding

One of the most interesting aspects of NER Retriever lies in its use of LLM internals. The study finds that internal representations, particularly value vectors from mid-layer transformer blocks, are crucial. These encode fine-grained type information more effectively. This is surprising because top-layer embeddings are commonly used for such tasks. The team revealed that these deeper layers hold richer, more specific semantic data. This challenges the assumption that the final output layers of an LLM are always the most informative. To refine these representations, a lightweight contrastive projection network is trained. This network aligns type-compatible entities while separating unrelated types. The resulting entity embeddings are compact and type-aware. This makes them highly suitable for nearest-neighbor search, according to the research.

What Happens Next

The NER Retriever Codebase is publicly available, as mentioned in the release. This means developers and researchers can begin experimenting with it immediately. We can expect to see early integrations and proofs-of-concept within the next few months. For example, you might see this system applied in search engines for legal documents or medical research. It could help identify specific disease variants or legal precedents without needing strict keyword matching. This capability could also enhance customer support systems. Imagine a chatbot understanding nuanced customer issues without explicit programming for every scenario. The company reports that NER Retriever significantly outperforms both lexical and dense sentence-level retrieval baselines. This suggests a strong foundation for future creation. This new approach could lead to more intelligent and adaptable information retrieval systems across many industries.

Ready to start creating?

Create Voiceover

Transcribe Speech

Create Dialogues

Create Visuals

Clone a Voice