Microsoft · Filed Dec 21, 2024 · Published Jun 25, 2026 · verified — real USPTO data

Microsoft Patents AI Agent System to Direct and Control Robotic Devices

By Patentlyze Team · Updated Jun 26, 2026

Microsoft is patenting a system where a human doesn't control a robot directly. Instead, they describe what needs to happen, and an AI agent figures out the actual instructions the robot should follow.

FIG. 1A — rendered from the official USPTO publication PDF.

Publication number US 2026/0178027 A1

Applicant MICROSOFT TECHNOLOGY LICENSING, LLC

Filing date Dec 21, 2024

Publication date Jun 25, 2026

Inventors Daniel ROSENSTEIN, Mark Alan STEVENS, Newman CHENG, Timothy Hahndeut CHUNG, Christopher Scott GUAGLIANO, Richard Jason ORTEGA, Gordon Parry BROADBENT IV, Aashish GHIMIRE

CPC classification 701/2

Grant likelihood Medium

Examiner SWEENEY, BRIAN P (Art Unit 3668)

Status Non Final Action Mailed (Apr 20, 2026)

Document 20 claims

AI/ML

What Microsoft's human-AI-robot collaboration session actually does

Imagine you're coordinating a search-and-rescue operation. Instead of manually piloting a drone or ground robot through every movement, you type or say something like "check the north corridor for survivors" into a shared workspace. An AI in that same workspace reads your request, figures out the specific commands the robot needs to execute, and sends them automatically.

That shared workspace is what Microsoft calls a "collaboration session." It's a kind of virtual room where you, one or more robots, and an AI agent are all present at once. You give the mission-level direction; the AI handles the translation into machine-level instructions.

The patent describes this as being tied to a specific geographical mission with defined goals and tasks. The human stays in the loop by providing input, but the AI does the heavy lifting of turning that input into something a robot can actually act on. Think of it less like a remote control and more like delegating to a very capable assistant who happens to speak robot.

How the AI agent translates human input into robot instructions

The system creates what the patent calls an "interaction environment," a shared digital space that three types of participants join simultaneously: a human participant, one or more robotic device participants, and an artificial intelligence agent participant.

The human interacts through a mechanism built into that environment, essentially an interface for describing tasks that are part of a larger mission. A mission, in this context, is a structured goal tied to a real-world geographic area, broken into discrete tasks (think: survey this zone, deliver this item, inspect this structure).

When the human submits input describing a task, the AI agent receives it and generates a specific instruction for the robot. That instruction is then transmitted through the collaboration session directly to the robotic device, which carries it out. The human never has to know the low-level command syntax the robot requires.

The session manages all three participants simultaneously under a single framework.
The AI acts as an intermediary layer between human intent and machine execution.
The geographic/mission structure means the system is built for organized, goal-oriented deployments rather than ad-hoc control.

What this means for AI-controlled robots in real-world missions

The practical implication is that a single person could potentially direct multiple robots across a mission without needing to understand how any individual robot works. The AI agent absorbs the complexity of translating goals into machine instructions, which could make it far easier to deploy robotic systems in high-stakes environments like disaster response, military operations, or large-scale industrial inspection.

For Microsoft, this fits alongside its broader push into enterprise AI tools and, separately, its long-standing defense and government contracts. The patent doesn't name a specific product, but the mission-and-geography framing strongly suggests applications beyond the office. If this makes it into a real product, you as an operator would be giving direction to a fleet of robots the way you'd assign tasks in a project management app.

Editorial take

This is a genuinely interesting structural patent because it formalizes something that sounds simple but is actually hard: making an AI the real-time translator between a human's intent and a robot's action list, all inside a shared session framework. The defense and emergency-response implications are obvious, and the architecture here is specific enough that it likely reflects real engineering work, not just a placeholder filing.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.

Microsoft Patents AI Agent System to Direct and Control Robotic Devices

What Microsoft's human-AI-robot collaboration session actually does

How the AI agent translates human input into robot instructions

What this means for AI-controlled robots in real-world missions

More from Microsoft

More in AI/ML

Get one Big Tech patent every Sunday