Merriam-Webster Sues OpenAI Over AI Training Data

Publishers allege widespread copyright infringement by AI giant, impacting revenue and information quality.

Encyclopedia Britannica and Merriam-Webster have filed a lawsuit against OpenAI. They claim OpenAI used nearly 100,000 copyrighted articles to train its large language models without permission. This legal action highlights growing concerns over AI's use of published content.

Katie Rowan

By Katie Rowan

March 16, 2026

4 min read

Merriam-Webster Sues OpenAI Over AI Training Data

Key Facts

  • Encyclopedia Britannica and Merriam-Webster have sued OpenAI.
  • The lawsuit alleges OpenAI scraped nearly 100,000 online articles for LLM training without permission.
  • Britannica claims OpenAI's outputs contain 'full or partial verbatim reproductions' of its content.
  • The lawsuit states ChatGPT starves web publishers of revenue and jeopardizes public access to quality information.
  • There is no strong legal precedent on whether training LLMs with copyrighted content is infringement.

Why You Care

Ever wondered where AI models get all their knowledge? What if that knowledge was taken without permission? This week, a major lawsuit hit the AI world, directly impacting how you might get your information online. Why should you care? Because the quality and trustworthiness of the digital information you consume could be at stake.

What Actually Happened

Encyclopedia Britannica, which owns Merriam-Webster, has initiated legal proceedings against OpenAI. According to the announcement, the lawsuit alleges that OpenAI scraped and utilized nearly 100,000 online articles. These articles were then used to train OpenAI’s large language models (LLMs) – the AI systems that power tools like ChatGPT – all without explicit permission. The publisher also claims OpenAI violated copyright laws. This occurred when ChatGPT generated outputs containing “full or partial verbatim reproductions” of Britannica’s content, as detailed in the blog post. What’s more, the AI lab allegedly uses these articles in ChatGPT’s Retrieval Augmented Generation (RAG) system, which helps AI models retrieve facts from specific documents.

Why This Matters to You

This legal battle has significant implications for you and your access to reliable information. The lawsuit argues that “ChatGPT starves web publishers like [Britannica] of revenue.” It does this by generating responses that “substitute, and directly compete with, the content from publishers like [Britannica],” the company reports. Imagine you’re researching a complex topic for work or school. Instead of visiting a trusted source like Merriam-Webster, you might ask ChatGPT. If ChatGPT provides an answer derived directly from their content without proper compensation, it impacts the publisher’s ability to create more high-quality material. This could lead to a decline in the very sources you rely on.

So, what happens if publishers continue to lose revenue this way?

Impact of AI Content Scraping

  • Reduced Revenue for Publishers: Less traffic means fewer ad dollars or subscriptions.
  • Decline in Quality Content: Publishers may struggle to fund in-depth research and writing.
  • Increased “Hallucinations”: If reliable sources diminish, AI models might rely on less accurate data.
  • Erosion of Trust: Your trust in online information could decrease if sources are compromised.

As mentioned in the release, Britannica also alleges that ChatGPT’s “hallucinations” – instances where AI generates false or misleading information – jeopardize “the public’s continued access to high-quality and trustworthy online information.” This directly affects your ability to get accurate facts.

The Surprising Finding

Here’s an interesting twist: The legal landscape around AI training data is still very much undefined. There isn’t a strong legal precedent that definitively establishes whether using copyrighted content to train an LLM constitutes copyright infringement, the research shows. However, in one notable instance, Anthropic, another AI company, faced a similar challenge. They successfully argued that using content as training data is “impactful enough to be legal,” as the study finds. This decision, made by federal judge William Alsup, suggests a complex interpretation. However, Alsup also ruled that Anthropic illegally downloaded millions of books, leading to a $1.5 billion settlement. This highlights the nuanced legal tightrope AI companies are walking. It challenges the common assumption that all uses of copyrighted material for AI training are automatically illegal.

What Happens Next

This lawsuit, like others before it, will likely unfold over the next several months, possibly into late 2026 or early 2027. The outcome could significantly reshape how AI models are trained and how publishers protect their intellectual property. For example, imagine future AI models being required to pay licensing fees for training data. This could create new revenue streams for content creators. It would also ensure the continued production of reliable information. For you, this might mean a more transparent and ethically sourced AI experience. We could see new industry standards emerging. These standards would dictate how AI companies acquire and use data. Our advice to you: Stay informed about these developments. Consider supporting your favorite publishers directly. This helps them continue providing the high-quality content you value.

Ready to start creating?

Create Voiceover

Transcribe Speech

Create Dialogues

Create Visuals

Clone a Voice