AI Struggles to Master Manga Meme Humor, Study Reveals

New research highlights AI's limitations in understanding contextual visual jokes.

A recent study introduces MaMe-Re, a benchmark for evaluating AI's ability to select humorous manga panel replies. While large language models show some promise in capturing complex social cues, they still struggle with subtle humor and integrating visual information effectively. This indicates a significant challenge for AI in mastering the nuances of internet meme culture.

Katie Rowan

By Katie Rowan

March 4, 2026

3 min read

AI Struggles to Master Manga Meme Humor, Study Reveals

Key Facts

  • A new benchmark, MaMe-Re, evaluates AI's ability to select humorous manga panel replies.
  • The benchmark contains 100,000 human-annotated pairs and 500,000 total annotations.
  • Large language models (LLMs) show preliminary ability to capture complex social cues like exaggeration.
  • Including visual information did not improve AI performance in selecting humorous replies.
  • AI struggles to distinguish subtle differences in wit among semantically similar candidates.

Why You Care

Have you ever wondered why some memes just hit different, while others fall flat? What if your AI assistant tried to crack a joke with a meme and completely missed the mark? New research reveals that even artificial intelligence (AI) struggles with the subtle art of selecting humorous visual replies, particularly in the context of manga memes. This impacts how you might interact with AI in the future.

What Actually Happened

A new paper, “Memes-as-Replies: Can Models Select Humorous Manga Panel Responses?” introduces a benchmark called MaMe-Re (Manga Meme Reply Benchmark). This benchmark aims to test how well AI can choose funny manga panels as responses to social media posts, according to the announcement. The dataset includes 100,000 human-annotated pairs of openly licensed Japanese manga panels and social media posts. What’s more, it features 500,000 total annotations from 2,325 unique annotators, the research shows. Authors Ryosuke Kohita and Seiichiro Yoshioka designed this to explore AI’s understanding of contextual humor. They focused on how large language models (LLMs) — complex AI programs that understand and generate human-like text — perform with this specific type of dynamic, visual communication.

Why This Matters to You

Understanding humor is a complex human trait. If AI can’t grasp it, what does that mean for your daily interactions with intelligent systems? This research highlights a essential gap in current AI capabilities. Imagine you’re chatting with an AI, and it tries to lighten the mood with a meme. If it consistently chooses irrelevant or unfunny images, your experience will suffer. The study finds that while LLMs can match human judgments in controlled settings, they often fail to distinguish subtle differences in wit among semantically similar candidates.

Consider these implications for future AI applications:

Area of ImpactCurrent AI CapabilityFuture Need
Social MediaBasic semantic matchingContextual humor, irony, sarcasm
Content CreationGenerating text, simple imagesHumorous visual content, meme generation
Customer ServiceAnswering questionsEmpathetic, witty, and culturally aware responses

For example, think of a social media manager using an AI tool to generate engaging content. If the AI suggests a meme that misinterprets the audience’s sentiment, it could lead to awkward or even damaging interactions. “Selecting contextually humorous replies remains an open challenge for current models,” the team revealed. How might your interactions with AI change if it could genuinely understand and create humor?

The Surprising Finding

Here’s the twist: the research uncovered something unexpected about how AI processes visual information for humor. The inclusion of visual information did not improve performance, the study finds. This is quite surprising because you might assume that seeing the image would help an AI understand its humorous context. However, the documentation indicates a significant gap. This gap exists between simply understanding the visual content itself and effectively using that visual content for contextual humor. It challenges the common assumption that more data, especially visual data, automatically leads to better AI performance in nuanced tasks like humor selection.

What Happens Next

This research paves the way for future developments in AI’s emotional intelligence. We can expect to see more focused efforts on multimodal AI — systems that combine different types of data like text and images — over the next 12-18 months. For example, developers might create new training methods specifically designed to teach AI the nuances of visual storytelling and comedic timing. Actionable advice for developers includes creating benchmarks that specifically target the integration of visual and textual cues for humor. The industry implications are vast, suggesting a need for more AI models that can move beyond surface-level semantic matching. This will enable AI to capture complex social cues, such as exaggeration, as mentioned in the release. This will ultimately lead to more engaging and human-like AI interactions for you.

Ready to start creating?

Create Voiceover

Transcribe Speech

Create Dialogues

Create Visuals

Clone a Voice