GUI-Owl-1.5: AI Agents Master Multi-Platform Interfaces

New AI model GUI-Owl-1.5 sets performance records across desktop, mobile, and web GUI tasks.

A new AI model, GUI-Owl-1.5, has achieved state-of-the-art results in automating graphical user interfaces (GUIs) across various platforms. This development promises more intuitive and efficient interactions with your devices, from phones to desktops, and even web browsers. It could fundamentally change how we interact with technology.

Mark Ellison

By Mark Ellison

February 24, 2026

4 min read

GUI-Owl-1.5: AI Agents Master Multi-Platform Interfaces

Key Facts

  • GUI-Owl-1.5 is a native GUI agent model with instruct/thinking variants from 2B to 235B parameters.
  • It supports multiple platforms including desktop, mobile, and web browsers.
  • The model achieved state-of-the-art results on over 20 GUI benchmarks, including 56.5 on OSWorld and 71.6 on AndroidWorld.
  • Key innovations include a Hybrid Data Flywheel for data collection and a unified thought-synthesis pipeline for reasoning.
  • A new reinforcement learning algorithm, MRPO, was developed to handle multi-platform challenges.

Why You Care

Ever wish your computer or phone could just understand what you want it to do, without endless clicks or taps? Imagine an AI that navigates your apps as smoothly as you do. This isn’t science fiction anymore. A new AI model, GUI-Owl-1.5, is making significant strides in this area. It promises to redefine how you interact with all your digital devices. What if your digital assistants truly understood your intentions across every screen you use?

What Actually Happened

A recent paper introduced GUI-Owl-1.5, a native GUI agent model. This model features instruct/thinking variants in multiple sizes, according to the announcement. These range from 2 billion to a massive 235 billion parameters. It supports a wide array of platforms, including desktop, mobile, and web browsers. This enables cloud-edge collaboration and real-time interaction, as detailed in the blog post. The model achieved results on over 20 GUI benchmarks. This marks a significant leap in AI’s ability to understand and operate graphical user interfaces.

GUI-Owl-1.5 incorporates several key innovations. One is the Hybrid Data Flywheel. This system constructs a data pipeline for UI understanding and trajectory generation. It combines simulated environments with cloud-based sandbox environments. This improves both the efficiency and quality of data collection, the paper states. What’s more, the model uses a unified thought-synthesis pipeline. This enhances its reasoning capabilities, placing emphasis on tool use, memory, and multi-agent adaptation.

Why This Matters to You

This new creation could dramatically change your daily digital life. Think of it as having a super-smart assistant for every app you use. It could automate repetitive tasks across different devices. For example, imagine telling your phone to “find the cheapest flight to Tokyo next month” and having it automatically open travel apps, compare prices, and present options. This happens without you manually navigating each app.

GUI-Owl-1.5 offers practical implications for you:

Area of ImpactYour Benefit
Task AutomationSpend less time on repetitive clicks; AI handles routine digital chores.
Cross-system Useexperience between your phone, tablet, and computer.
AccessibilityEasier device interaction for users with diverse needs.
EfficiencyFaster completion of complex tasks across multiple applications.

How much time could you save if your devices anticipated your next move? The team revealed that GUI-Owl-1.5 achieved 56.5 on OSWorld for GUI automation tasks. This demonstrates its capability to perform complex actions within operating systems. “The GUI-Owl-1.5 models are open-sourced, and an online cloud-sandbox demo is available,” the team revealed. This means developers can start experimenting with its capabilities immediately. This opens doors for creating new, more intuitive applications for your use.

The Surprising Finding

Here’s the twist: despite the complexity of multi-system environments, GUI-Owl-1.5 achieved impressive results. It did so using a novel approach to reinforcement learning. The team proposed a new environment RL algorithm, MRPO. This addresses challenges like multi-system conflicts and inefficient training for long-horizon tasks, as the technical report explains. This is surprising because training AI agents to operate consistently across vastly different interfaces (like a desktop OS versus a mobile app) is incredibly difficult. Historically, this has been a major hurdle for AI creation. Yet, the research shows GUI-Owl-1.5 achieved 71.6 on AndroidWorld and 48.4 on WebArena. These scores highlight its performance across diverse GUI environments. This challenges the assumption that unified GUI agents are inherently limited by system fragmentation. It suggests a more cohesive future for AI interaction.

What Happens Next

The release of GUI-Owl-1.5 marks a significant step forward for AI agents. We can expect to see more developers integrating these capabilities into their applications within the next 12-18 months. Imagine your smart home system gaining the ability to interact with any appliance’s digital interface. This could happen regardless of its brand or operating system. For example, a single AI command could manage your smart oven, washing machine, and entertainment system. This would happen even if they use different apps. The open-sourcing of the models means rapid community creation is likely. Actionable advice for you is to keep an eye on updates from your favorite app developers. They might soon announce new AI-powered features. These features could make your digital interactions much smoother. The industry implications are vast, potentially leading to a new era of truly intelligent user interfaces. As mentioned in the release, the model also excelled in grounding tasks, obtaining 80.3 on ScreenSpotPro. This suggests a strong foundation for future advancements.

Ready to start creating?

Create Voiceover

Transcribe Speech

Create Dialogues

Create Visuals

Clone a Voice