Gemini 2.5 Computer Use: AI Agents Navigate Your Digital World

Google DeepMind unveils a specialized AI model for seamless interaction with graphical user interfaces.

Google DeepMind has launched the Gemini 2.5 Computer Use model, allowing AI agents to interact with web and mobile interfaces like humans. This model, available via the Gemini API, excels in UI control benchmarks and promises to automate complex digital tasks.

Mark Ellison

By Mark Ellison

October 8, 2025

4 min read

Gemini 2.5 Computer Use: AI Agents Navigate Your Digital World

Key Facts

  • Google DeepMind released the Gemini 2.5 Computer Use model via the Gemini API.
  • The model allows AI agents to interact directly with graphical user interfaces (GUIs).
  • It outperforms other models in web and mobile UI control benchmarks with lower latency.
  • The model operates in a loop, taking screenshots and action history as input.
  • It can natively fill out forms, manipulate interactive elements, and operate behind logins.

Why You Care

Ever wish an AI could just do your online tasks for you, navigating websites and apps with ease? What if your digital assistant could fill out forms, click buttons, and even organize your sticky notes? Google DeepMind is making this a reality with its new Gemini 2.5 Computer Use model, now available for developers. This creation means smarter, more capable AI agents are on the horizon, ready to tackle your everyday digital chores. You’ll soon experience a new level of AI assistance.

What Actually Happened

Google DeepMind has introduced the Gemini 2.5 Computer Use model, according to the announcement. This specialized AI model, built upon the Gemini 2.5 Pro, is designed to power agents that can directly interact with user interfaces (UIs). Instead of relying solely on structured APIs (Application Programming Interfaces), this model can navigate web pages and applications much like a human would. Think clicking, typing, and scrolling, as detailed in the blog post. This capability is crucial for tasks like filling out forms or manipulating interactive elements such as dropdowns and filters. Developers can now access this model through the Gemini API on Google AI Studio and Vertex AI, the company reports.

Why This Matters to You

This new model significantly expands what AI agents can do for you. Imagine an AI agent that can handle complex online purchases or manage your project boards without needing explicit programming for every step. The model’s core capabilities are exposed through a new computer_use tool in the Gemini API, as the technical report explains. This tool operates within a loop, taking user requests, screenshots of the environment, and a history of recent actions as inputs. The model then analyzes these inputs and generates a response, typically a function call representing a UI action. This could be anything from clicking a button to typing text. For example, if you ask an AI to “book me a pet spa appointment,” it could navigate the website, select services, and fill in your details. How much time could you save if an AI handled these routine tasks for you?

This iterative process continues until your task is complete. The client-side code executes the action, then a new screenshot is sent back to the model, restarting the loop. This ensures continuous interaction. The team revealed that this model outperforms other solutions in web and mobile control benchmarks. This means it’s not just capable, but also highly efficient. The model is primarily for web browsers, but also shows strong promise for mobile UI control tasks, according to the announcement.

Here’s a quick look at its potential impact:

  • Automated Online Tasks: Filling forms, making reservations, managing subscriptions.
  • Enhanced Digital Assistants: More capable AI that understands visual cues on your screen.
  • Improved Accessibility: Potentially helping users with disabilities navigate complex interfaces.
  • Developer Empowerment: New tools for creating AI agents.

The Surprising Finding

Perhaps the most surprising aspect is the model’s ability to operate effectively behind logins and natively manipulate interactive elements. While AI models often struggle with dynamic or login-protected environments, the Gemini 2.5 Computer Use model excels here. The research shows that it can “natively fill out forms, manipulate interactive elements like dropdowns and filters, and operate behind logins.” This challenges the common assumption that AI agents are limited to public, easily accessible interfaces. It means your AI agent could, for instance, log into your banking portal to check statements (with your explicit permission, of course) or manage your online shopping cart. This capability signifies a significant leap in general-purpose agent creation. The model even requires end-user confirmation for sensitive actions like purchases, ensuring safety and control, as mentioned in the release.

What Happens Next

Developers can begin experimenting with the Gemini 2.5 Computer Use model now, with broader applications expected to emerge over the next few months. We anticipate seeing initial agent prototypes utilizing this system by late 2024 or early 2025. For example, imagine an AI agent that can automatically set up a new software account for you, navigating complex registration forms across multiple web pages. This would save you considerable time and effort. The industry implications are vast, suggesting a future where AI agents become truly autonomous digital assistants. For readers, a key takeaway is to start thinking about which repetitive digital tasks you’d be comfortable delegating to an AI. As the documentation indicates, the model is not yet for desktop OS-level control, but its web and mobile prowess is a strong starting point. This creation paves the way for more intelligent automation across your digital life. The company reports that you can share feedback in the Developer Forum, actively shaping its future.

Ready to start creating?

Create Voiceover

Transcribe Speech

Create Dialogues

Create Visuals

Clone a Voice