Microsoft · Filed Apr 25, 2025 · Published Jun 18, 2026 · verified — real USPTO data

Microsoft's New Patent Teaches AI to Click Through Software the Way a Person Would

Microsoft is working on an AI model that learns how to click through software interfaces the same way a person would — by studying recorded sequences of steps taken to complete real tasks.

Microsoft Patent: AI Model Pre-Training for UI Navigation — figure from US 2026/0170817 A1
FIG. 1A — rendered from the official USPTO publication PDF.
Publication number US 2026/0170817 A1
Applicant Microsoft Technology Licensing, LLC
Filing date Apr 25, 2025
Publication date Jun 18, 2026
Inventors Xiaoyi ZHANG, Yuwang WANG, Zhizheng ZHANG, Wenxuan XIE, Yan LU
CPC classification 715/708
Grant likelihood Low
Examiner CENTRAL, DOCKET (Art Unit OPAP)
Status Docketed New Case - Ready for Examination (Mar 16, 2026)
Parent application is a National Stage Entry of PCTUS2023081484 (filed 2023-11-29)
Document 21 claims

What Microsoft's UI-navigation AI model actually does

Imagine you need to complete a task in a complicated piece of software — say, exporting a report buried three menus deep. A human does it by clicking through a series of screens in a specific order. Microsoft's patent describes a way to teach an AI to do that same kind of navigation, automatically.

The core idea is to train the AI not just on individual screens, but on full navigation paths — the whole sequence of steps from start to finish, matched with a description of what the task actually was. That way, the AI learns the relationship between a goal ("export the report") and the route through the interface to get there.

The goal is an AI model that can generalize: once trained, it should be able to handle new interfaces and tasks it hasn't explicitly seen before, without needing to be retrained from scratch each time.

How the model learns from recorded navigation paths

The patent describes a pre-training pipeline for an AI model focused on user interface navigation — the kind of task where an agent must move through screens, menus, and dialog boxes to complete a goal.

The training data has three parts working together:

  • Navigation paths: recorded sequences of UI screens that correspond to completing a specific task
  • UI descriptions: text descriptions of the elements visible on each screen (buttons, fields, menus)
  • Task descriptions: plain-language descriptions of what the navigation path is trying to accomplish

The feature extraction model (the AI component that converts raw UI and task information into a structured internal representation) is trained on the correspondence between all three. Rather than learning about individual screens in isolation, the model learns at the path level — meaning it understands how a sequence of screens connects to a real-world goal.

This approach is designed so the pre-trained model can be adapted to downstream navigation tasks (new software, new goals) without starting over, a technique common in modern AI training called transfer learning.

What this means for AI agents controlling your software

AI agents that can operate software interfaces are becoming a real product category — think of tools that automatically fill forms, book appointments, or dig through enterprise software so you don't have to. The limiting factor today is that these agents tend to be brittle: train them on one app and they fall apart on another.

By pre-training a model on navigation paths rather than static screenshots, Microsoft's approach could produce agents that transfer more reliably across different software products. That's directly relevant to Microsoft's push to embed AI agents inside products like Windows, Office, and Azure — where the AI needs to operate unfamiliar interfaces without constant retraining.

Editorial take

This is foundational infrastructure work for AI agents, not a flashy consumer feature. The claims were canceled in publication, which usually signals the patent is being reworked — so treat this as a research direction rather than a shipping capability. Still, it's a clear sign that Microsoft is investing seriously in the training methodology behind autonomous software agents, which is the unglamorous work that makes those agents actually useful.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.