Gemini 2.5 Computer Use Model: AI Agents for UI Interaction

Google DeepMind unveils a specialized model enabling AI to navigate web and mobile interfaces like humans.

Google DeepMind has launched the Gemini 2.5 Computer Use model, allowing developers to create AI agents that can interact directly with user interfaces. This new model, available via the Gemini API, is designed to automate tasks like form filling and navigation on websites and mobile apps. It promises lower latency and improved performance on control benchmarks.

By Mark Ellison

December 5, 2025

4 min read

Gemini 2.5 Computer Use Model: AI Agents for UI Interaction

Key Facts

Google DeepMind released the Gemini 2.5 Computer Use model via the Gemini API.
The model powers AI agents that interact directly with user interfaces (UIs).
It outperforms other models in web and mobile control benchmarks with lower latency.
Core capabilities are exposed through the new `computer_use` tool in the Gemini API.
The model is primarily optimized for web browsers but shows promise for mobile UI control.

Why You Care

Ever wish your computer could just do that tedious online task for you, without you lifting a finger? What if AI could navigate websites and apps as seamlessly as you do?

Google DeepMind has just announced the Gemini 2.5 Computer Use model. This specialized AI is built to power agents that interact directly with user interfaces. This means your digital assistants could soon handle complex online processes automatically. It’s a significant step towards more autonomous and helpful AI.

What Actually Happened

Google is releasing the Gemini 2.5 Computer Use model, according to the announcement. This model is available through the Gemini API. It allows developers to build AI agents that can interact with user interfaces (UIs) — essentially, the screens you see and click on. This new capability is built on the foundation of Gemini 2.5 Pro.

Unlike traditional AI that works with structured data, this model handles graphical UIs. It can perform actions like clicking, typing, and scrolling, just like a human. This is crucial for tasks such as filling out forms or manipulating interactive elements online. The model is currently available in preview on Google AI Studio and Vertex AI, as mentioned in the release.

Why This Matters to You

Imagine an AI assistant that can truly understand and interact with your digital world. This new model brings that vision closer to reality. It’s designed to automate tasks that previously required human intervention. For example, think about booking a complex multi-leg flight or organizing tasks on a chaotic digital whiteboard.

This model’s core capabilities are exposed through a new computer_use tool in the Gemini API. It operates in a loop, taking a screenshot of your environment and your request as input. The model then generates an action, such as a click or a type command. “The ability to natively fill out forms, manipulate interactive elements like dropdowns and filters, and operate behind logins is a crucial next step in building , general-purpose agents,” the company reports.

How much time could you save if an AI handled your repetitive online chores?

Here’s a look at some potential applications:

Automated Data Entry: Filling out online forms for applications or surveys.
Complex Online Purchases: Navigating e-commerce sites to find and buy specific items.
Digital Workflow Automation: Managing tasks across various web-based tools.
Customer Support Bots: Guiding users through website processes interactively.

The Surprising Finding

Here’s the interesting part: while for web browsers, the Gemini 2.5 Computer Use model also shows strong promise for mobile UI control tasks. This is a significant creation. Often, mobile interfaces present unique challenges for AI interaction due to varying screen sizes and gestures.

The model demonstrates impressive performance on multiple web and mobile control benchmarks, the research shows. For instance, it leads in Online-Mind2Web, WebVoyager, and AndroidWorld benchmarks. This indicates a capability across different digital environments. It challenges the assumption that separate, highly specialized models would be needed for each system. The team revealed that it performs better than other models in these areas with lower latency.

What Happens Next

Developers can start experimenting with the Gemini 2.5 Computer Use model now. It’s available in preview via the API. We can expect to see more AI agents emerging over the next few months. These agents will be capable of handling a wider range of online tasks automatically.

For example, imagine an AI that can manage your entire online shopping experience, from finding the best deals to completing the checkout process. This system could also lead to more accessible web experiences for users with disabilities. It could help them navigate complex sites more easily.

Industry implications are vast, impacting areas from customer service to business process automation. Developers should explore the computer_use tool in the Gemini API. Sharing feedback in the Developer Forum will help refine this new capability. While not yet for desktop OS-level control, its web and mobile prowess is a clear indicator of future directions.

Ready to start creating?