Local AI's Power Play: 'Intelligence per Watt' Redefines Efficiency

New research introduces a critical metric for evaluating AI performance on personal devices.

A recent study proposes 'intelligence per watt' (IPW) as a new metric for local AI efficiency. This could shift how we use large language models, moving processing from the cloud to your personal devices. The research evaluates small LMs on various accelerators using real-world queries.

Mark Ellison

By Mark Ellison

November 15, 2025

4 min read

Local AI's Power Play: 'Intelligence per Watt' Redefines Efficiency

Key Facts

  • A new metric, 'intelligence per watt' (IPW), is proposed for measuring local AI efficiency.
  • The study evaluates over 20 state-of-the-art local large language models (LLMs) on 8 accelerators.
  • IPW measures task accuracy divided by the unit of power consumed.
  • Smaller LLMs (20 billion parameters or less) now offer competitive performance to larger cloud models.
  • The research suggests local inference can redistribute demand from centralized cloud infrastructure.

Why You Care

Ever feel your AI assistant is a bit slow, or wonder about the energy it consumes? What if your personal devices could run AI tasks efficiently, right where you are? A new study introduces a essential metric, ‘intelligence per watt,’ to measure local AI efficiency. This could change how you interact with AI daily. It might also reduce the reliance on massive, energy-hungry cloud data centers.

What Actually Happened

Researchers have unveiled a new metric called ‘intelligence per watt’ (IPW), according to the announcement. This metric assesses the capability and efficiency of local AI inference. Local inference means running AI models directly on your device, like a laptop, instead of sending data to the cloud. The team revealed this new approach in a paper titled “Intelligence per Watt: Measuring Intelligence Efficiency of Local AI.” They conducted a large-scale empirical study. This study involved over 20 local large language models (LLMs). It also included 8 different accelerators, such as the Apple M4 Max. The goal was to understand if local LLMs could handle real-world queries effectively. What’s more, they wanted to know if these models could do so efficiently on power-constrained devices.

Why This Matters to You

Today, most large language model (LLM) queries go through centralized cloud infrastructure. This puts a huge strain on cloud providers, as detailed in the blog post. However, smaller LLMs (those with 20 billion active parameters or fewer) are now performing competitively. These models can run on local accelerators with interactive latencies. This means less waiting for responses from the cloud. The new IPW metric helps us understand this potential shift. It measures task accuracy divided by the unit of power consumed. This is crucial for devices like laptops where battery life matters. Imagine your laptop running complex AI tasks without draining its power quickly. How might this change your daily workflow or creative process?

Consider these benefits of efficient local AI:

  • Enhanced Privacy: Your data stays on your device.
  • Reduced Latency: Faster responses without internet reliance.
  • Lower Energy Consumption: Less power used by massive data centers.
  • Offline Capability: AI functions even without an internet connection.

As the paper states, “small LMs (<=20B active parameters) now achieve competitive performance to frontier models on many tasks.” This means AI could soon be a standard feature on your personal devices. For example, think about editing a video with AI suggestions instantly, without uploading footage. Or drafting complex emails with an AI assistant that understands your personal context, all offline.

The Surprising Finding

Here’s the twist: the research suggests that local AI could significantly redistribute demand from centralized cloud infrastructure. This challenges the common assumption that AI always needs massive cloud computing. The study found that local LMs can accurately answer real-world queries. More importantly, they can do so efficiently enough for practical use on devices like laptops. The team measured accuracy, energy, latency, and power for each query. This comprehensive approach revealed that local inference is not just a theoretical possibility. It’s a viable alternative for many tasks. This means the future of AI might be more decentralized than previously thought. Your devices could become AI hubs.

What Happens Next

This research points towards a future where AI processing happens closer to you. We might see more devices for ‘intelligence per watt’ within the next 12-18 months. Manufacturers could start highlighting IPW scores in their product specifications. For example, new laptops might advertise their ability to run AI models efficiently. This could influence your next device purchase. The industry implications are significant. Cloud providers may need to adapt their strategies. Device manufacturers will likely focus more on integrating , efficient local AI capabilities. The actionable advice for you is to pay attention to these emerging metrics. Look for devices that promise high IPW. This will ensure your personal AI experiences are both and practical. This shift could make AI truly personal and ubiquitous, according to the announcement.

Ready to start creating?

Create Voiceover

Transcribe Speech

Create Dialogues

Create Visuals

Clone a Voice