GPT-4o's Hidden Rhythms: Daily and Weekly Performance Swings

New research reveals surprising periodic variability in the accuracy of OpenAI's flagship AI model.

A recent study found that GPT-4o's performance fluctuates significantly on daily and weekly cycles, even under controlled conditions. This discovery challenges assumptions about AI consistency and has major implications for anyone using large language models for critical tasks.

Katie Rowan

By Katie Rowan

March 4, 2026

4 min read

GPT-4o's Hidden Rhythms: Daily and Weekly Performance Swings

Key Facts

  • GPT-4o's performance shows notable daily and weekly periodic variability.
  • This variability accounts for approximately 20% of its total performance variance.
  • The study used a fixed model snapshot, hyperparameters, and identical prompting for three months.
  • GPT-4o was queried every three hours to solve multiple-choice physics tasks.
  • The findings challenge the assumption of time-invariant LLM performance.

Why You Care

Ever wonder if your AI assistant is having a ‘good day’ or a ‘bad day’? What if the performance of large language models (LLMs) isn’t as consistent as we think? New research indicates that even models like GPT-4o show noticeable daily and weekly performance fluctuations. This discovery could impact anyone relying on AI for consistent results, from developers to content creators. Your AI might be smarter on Tuesdays than on Fridays. Are you getting the best from your AI tools at all times?

What Actually Happened

Researchers Paul Tschisgale and Peter Wulff conducted a longitudinal study on GPT-4o’s performance consistency. They investigated whether LLM performance remains stable over time under fixed conditions, according to the announcement. This included using an identical model snapshot, hyperparameters, and prompting strategy. The team queried GPT-4o via its API to solve the same multiple-choice physics task every three hours. This rigorous testing spanned approximately three months, as mentioned in the release. At each time point, ten independent responses were generated. Their scores were then averaged to track performance trends. The goal was to empirically examine the assumption of time-invariant — or unchanging over time — LLM performance.

Why This Matters to You

This finding has significant practical implications for anyone interacting with LLMs. If you’re using GPT-4o for tasks like content generation, coding assistance, or data analysis, its output quality might vary depending on when you query it. Imagine you’re developing a essential application. You might get different results at 9 AM versus 9 PM. This variability could affect the reliability and validity of your work.

Consider these potential impacts on your daily use:

  • Content Creation: Your AI might generate more coherent articles during peak performance times.
  • Code Generation: AI-assisted coding could produce more accurate or efficient code at specific hours.
  • Research: The reproducibility of AI-driven research findings might be compromised by these fluctuations.

As the paper states, “Much of this work implicitly assumes that LLM performance under fixed conditions (identical model snapshot, hyperparameters, and prompt) is time-invariant.” This assumption, the research shows, is now called into question. Do you know the optimal times to interact with your AI tools for the best results?

The Surprising Finding

The most surprising revelation from this study is the existence of clear periodic variability in GPT-4o’s average performance. Spectral (Fourier) analysis of the collected data revealed notable rhythmic patterns. These patterns accounted for approximately 20% of the total variance, according to the announcement. Specifically, the observed periodic patterns are well explained by the interaction of a daily and a weekly rhythm. This means GPT-4o’s “intelligence” or accuracy isn’t a flat line. Instead, it ebbs and flows like ocean tides, but on a predictable schedule. This challenges the common assumption that once an AI model is deployed, its core capabilities remain static. It suggests that external or internal factors, perhaps server load or ongoing background processes, could be influencing its output quality.

Key Statistical Finding: Periodic variability accounted for approximately 20% of GPT-4o’s total performance variance.

What Happens Next

Understanding these performance rhythms is crucial for future AI creation and application. Developers might need to factor in these periodicities when designing systems that rely on consistent LLM output. For example, essential AI tasks could be scheduled during known peak performance windows. In the next 6-12 months, we might see AI platforms offering insights into optimal usage times. Imagine an API that tells you the best time to run your intensive AI query for maximum accuracy. For users, the actionable advice is to be aware of this variability. If you notice inconsistent results, consider re-running your queries at different times. The team revealed that these findings have significant implications for ensuring the validity and replicability of research that uses or investigates LLMs. This study opens the door for further investigation into the underlying causes of these AI performance fluctuations. It also highlights the need for more testing methodologies in the AI community.

Ready to start creating?

Create Voiceover

Transcribe Speech

Create Dialogues

Create Visuals

Clone a Voice