LLMs Uncover Science's Hidden Rules, Reshaping Peer Review

A new framework uses language models to expose unspoken biases and norms in academic evaluation.

A recent study explores how large language models (LLMs) can reveal the 'unwritten code' of science, specifically in peer review. By analyzing LLM 'self-talk,' researchers found hidden preferences that influence how papers are judged, highlighting a new diagnostic tool for societal biases.

Katie Rowan

By Katie Rowan

October 10, 2025

4 min read

LLMs Uncover Science's Hidden Rules, Reshaping Peer Review

Key Facts

  • A new conceptual framework uses LLMs to surface society's 'unwritten code.'
  • The framework was applied to peer review in 45 academic conferences.
  • LLMs' normative priors (e.g., theoretical rigor) were updated towards storytelling about external connections.
  • Human reviewers implicitly reward storytelling despite not explicitly articulating it.
  • The correlation between human explicit rewards and LLM normative priors was 0.49.

Why You Care

Ever wonder if there’s more to scientific success than meets the eye? What if the rules aren’t always written down? A new paper suggests that large language models (LLMs) can actually expose these hidden forces, particularly in the essential world of academic peer review. This isn’t just about how papers get published; it’s about understanding the subtle biases that shape our world. How might uncovering these ‘unwritten codes’ change your understanding of fairness and evaluation?

What Actually Happened

Researchers have introduced a novel conceptual structure, according to the announcement. This structure uses large language models (LLMs) to uncover society’s “unwritten code.” Specifically, it focuses on implicit stereotypes and heuristics (mental shortcuts) that often go unstated. The team, including authors Honglin Bao and James A. Evans, applied this structure to a case study in science: peer review. They aimed to reveal hidden rules that reviewers consider important but rarely articulate directly. These rules are often omitted due to normative scientific expectations. The structure pushes LLMs to generate self-consistent hypotheses. It explores why one paper received a stronger score than another in paired submissions. This process iteratively searches for deeper hypotheses from remaining pairs. The study analyzed papers submitted to 45 academic conferences, as detailed in the blog post.

Why This Matters to You

This research offers a new way to diagnose and understand biases, not just in science, but in many areas of life. Imagine applying this to hiring processes or even artistic evaluations. The structure allows LLMs to “speak out their heuristics,” revealing their underlying thought processes. This can help us understand human decision-making more clearly. For example, if you’ve ever felt a project was unfairly judged, this approach could help pinpoint the unspoken criteria at play. The study found that LLMs’ initial views on “good science” (like theoretical rigor) evolved. They shifted towards emphasizing “storytelling about external connections,” such as how work is positioned within literature. This suggests that presentation and context matter significantly, even if not explicitly stated by human reviewers. What unspoken rules might be influencing your own field or daily interactions?

“This paper calls on the research community not only to investigate how human biases are inherited by large language models (LLMs) but also to explore how these biases in LLMs can be leveraged to make society’s ‘unwritten code’ — such as implicit stereotypes and heuristics — visible and accessible for critique,” the paper states.

Here’s a look at how LLM priors evolved:

LLM’s Initial Priors (Normative)LLM’s Updated Posteriors (Contextual)
Theoretical rigorPositioning within literature
Internal characteristicsConnections across literatures
Foundational conceptsStorytelling and external relevance

The Surprising Finding

Here’s the twist: while human reviewers explicitly reward aspects aligning with LLMs’ normative priors (like theoretical rigor), they often avoid articulating the importance of contextualization and storytelling in their comments. The research shows a correlation of 0.49 between human reviewer rewards and LLMs’ normative priors. However, there was a correlation of -0.14 regarding contextualization and storytelling posteriors. This means human reviewers implicitly reward these storytelling elements with positive scores, even when they don’t mention them. This challenges the common assumption that scientific evaluation is purely objective and explicitly stated. It reveals a hidden layer of subjective, unstated criteria influencing outcomes. This pattern remained across different models and out-of-sample judgments, the team revealed.

What Happens Next

This structure has broad applicability beyond just science. We can expect to see similar analyses applied to other complex social systems within the next 12-18 months. For example, imagine using this to analyze hiring algorithms to uncover implicit biases against certain demographics. Actionable advice for you: consider how your own work or proposals are framed. Emphasizing “storytelling about external connections” might be more impactful than you realize, even if not explicitly requested. This could lead to more precisely targeted responsible AI initiatives. The industry implications are significant, potentially leading to more transparent and equitable evaluation systems across various fields. The documentation indicates this approach could amplify and surface tacit codes underlying human society. This enables public discussion of revealed values, as mentioned in the release.

Ready to start creating?

Create Voiceover

Transcribe Speech

Create Dialogues

Create Visuals

Clone a Voice