Gemma 3: Google's New AI Model Runs on Your Devices

The latest open models from Google DeepMind promise powerful AI on single GPUs, expanding accessibility.

Google DeepMind has launched Gemma 3, a new family of open AI models designed for on-device deployment. These models offer state-of-the-art performance, support over 140 languages, and include advanced text and visual reasoning, making powerful AI more accessible.

By Mark Ellison

December 4, 2025

4 min read

Gemma 3: Google's New AI Model Runs on Your Devices

Key Facts

Gemma 3 is a collection of lightweight, open AI models from Google DeepMind.
It is designed to run on single GPUs or TPUs on devices like phones, laptops, and workstations.
Gemma 3 supports over 140 languages and has advanced text and visual reasoning capabilities.
The model includes a 128k-token context window and supports function calling.
Gemma 3 27B achieved an Elo score of 1338 on the Chatbot Arena leaderboard, outperforming some larger models.

Why You Care

Ever wish you could run AI directly on your laptop or phone, without needing a massive data center? What if AI could live right on your device, making your apps smarter and faster? Google DeepMind just made a significant leap towards that future. They introduced Gemma 3, a new collection of open models designed for on-device performance. This means more personalized and private AI experiences are coming your way, directly from your everyday gadgets.

What Actually Happened

Google DeepMind has unveiled Gemma 3, a new family of lightweight, open AI models, as mentioned in the release. These models are built using the same core research and system that powers their Gemini 2.0 models. The company reports that Gemma 3 is their most , portable, and responsibly developed open model series to date. It’s specifically designed to run quickly and efficiently on individual devices. This includes everything from smartphones and laptops to workstations, according to the announcement. Gemma 3 comes in various sizes, including 1B, 4B, 12B, and 27B, allowing developers to choose the best fit for their specific hardware and performance needs.

Why This Matters to You

Imagine an AI assistant that understands your local dialect or can analyze images instantly on your phone, even without an internet connection. Gemma 3 makes this closer to reality for you. The models offer text and visual reasoning capabilities, allowing applications to analyze images, text, and even short videos, as detailed in the blog post. This opens up new possibilities for interactive and intelligent applications. What’s more, Gemma 3 supports function calling, which helps automate tasks and build agentic (autonomous) experiences.

For example, think of a smart home device that can process complex voice commands in your native language, regardless of how obscure it might be. Or consider a photo editing app that uses on-device AI to suggest improvements based on the image’s content, all without uploading your personal photos to the cloud. What kind of personalized AI experiences do you envision for your daily life?

“Gemma 3 delivers performance for its size, outperforming Llama3-405B, DeepSeek-V3 and o3-mini in preliminary human preference evaluations on LMArena’s leaderboard,” the team revealed. This means you can build engaging user experiences that fit on a single GPU or TPU host.

Here are some key capabilities:

Global Language Support: Gemma 3 offers out-of-the-box support for over 35 languages. It also includes pretrained support for over 140 languages, according to the announcement.
Expanded Context Window: The model boasts a 128k-token context window. This allows your applications to process and understand vast amounts of information, the documentation indicates.
Quantized Versions: Official quantized versions are available. These reduce model size and computational requirements while maintaining high accuracy, the company reports.

The Surprising Finding

Here’s a twist: despite its smaller size, Gemma 3 27B demonstrates impressive performance. The research shows that Gemma 3 27B achieved an Elo score of 1338 on the Chatbot Arena leaderboard. This positions it competitively against much larger models. For instance, Llama3-405B, a significantly larger model, scored 1333. This is surprising because typically, larger models require more computational power and tend to perform better. The fact that Gemma 3, designed for single-accelerator use, can rival such models challenges the assumption that bigger is always better for AI performance. It highlights Google DeepMind’s efficiency in model design.

What Happens Next

Developers can start experimenting with Gemma 3 immediately, as the models are openly available. We can expect to see new applications leveraging these on-device capabilities emerging over the next 6-12 months. For example, imagine a new generation of smart glasses that perform real-time translation or object recognition using Gemma 3’s visual reasoning. This would happen directly on the device, without cloud latency.

If you’re a developer, consider integrating Gemma 3 into your projects. It offers a , efficient way to bring AI directly to users’ devices. The industry implications are significant, potentially democratizing access to AI. This could foster creation in edge computing and privacy-preserving AI applications. The team revealed that Gemma 3 is foundational to their commitment to making useful AI system accessible.

Ready to start creating?