Microsoft · Filed Dec 20, 2024 · Published Jun 25, 2026 · verified — real USPTO data

Microsoft Patent: Describe What You Want, Train an AI to Recognize Images

By Patentlyze Team · Updated Jun 26, 2026

Building a custom AI that can analyze images normally requires a team of engineers. Microsoft is patenting a system that lets an ordinary user do it through a conversation, the same way you'd describe a task to a colleague.

FIG. 1A — rendered from the official USPTO publication PDF.

Publication number US 2026/0178625 A1

Applicant MICROSOFT TECHNOLOGY LICENSING, LLC

Filing date Dec 20, 2024

Publication date Jun 25, 2026

Inventors Julia GONG, Kuan LU, Houdong HU, Adina Magdalena TRUFINESCU, Günter NOGUEIRA LOCH, Tong BAI, Georgios GEORGIADIS, Nishant YADAV, Pei GUO, Jun PAN, Chongyang BAI, Cha ZHANG

CPC classification 704/9

Grant likelihood Medium

Examiner YEN, ERIC L (Art Unit 2658)

Status Docketed New Case - Ready for Examination (Feb 7, 2025)

Document 20 claims

AI/ML

What Microsoft's conversational model-builder actually does

Imagine you run a factory and you want an AI that flags defective parts coming off the assembly line. Right now, getting that built means hiring someone who knows machine learning, writing code, and training a model from scratch. That's expensive and slow.

Microsoft's patent describes a system where you just describe what you want, using plain language and example images. The system figures out which kind of AI task fits your description, proposes a definition back to you, and refines it through a back-and-forth conversation until it matches what you actually need.

Once the model is running, you can look at the results it produces and give feedback directly on the images it analyzed. The system uses that feedback to adjust and improve the model automatically. No coding, no retraining from scratch.

How the system turns your words and images into a working AI pipeline

The patent describes a multi-step pipeline built around what Microsoft calls a model customization agent, essentially an AI coordinator that manages the whole process.

When a user submits a request, they can provide both image data (example photos or screenshots) and language data (a text description of the task). The system runs natural language processing on the text and image processing on the pictures, then compares the combined result against an index of pre-defined task types to find the closest match.

Next, the system enters an iterative negotiation loop. It presents a proposed task definition (a structured description of what the AI would do, including what inputs it expects and what outputs it would produce) and asks the user to confirm or refine it. This back-and-forth continues until the definition is locked in as an inference contract, a formal specification the AI model will follow.

From that contract, the system automatically generates an execution processing flow, the actual technical pipeline that runs the model. Users can then review the model's outputs on real images and submit feedback, which the system uses to modify the pipeline and improve accuracy over time.

What this means for people who need AI vision tools but can't code

The gap between "I need an AI that does X" and "I have an AI that does X" is currently filled by specialists. This patent describes a system designed to close that gap for business users, analysts, or domain experts who understand their problem well but have no machine learning background. If it works as described, the same person who notices a recurring defect in a product could build the tool to catch it automatically.

For Microsoft, this fits squarely into its broader push to bring AI capabilities into enterprise tools like Azure and Copilot Studio. A working version of this system could make custom computer-vision models as accessible as building a spreadsheet formula, which would be a meaningful shift in who can use AI at work.

Editorial take

This is a genuinely interesting patent because it targets a real and well-documented barrier: most organizations that could benefit from custom AI vision tools can't build them. The conversational refinement loop and feedback-driven improvement cycle are thoughtful design choices. The bigger question is whether the underlying model library is broad enough to cover the variety of tasks real users will describe, but that's an execution problem, not a concept problem.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.

Microsoft Patent: Describe What You Want, Train an AI to Recognize Images

What Microsoft's conversational model-builder actually does

How the system turns your words and images into a working AI pipeline

What this means for people who need AI vision tools but can't code

More from Microsoft

More in AI/ML

Get one Big Tech patent every Sunday