Microsoft · Filed Feb 20, 2026 · Published Jun 25, 2026 · verified — real USPTO data

Microsoft Patent Targets Corrupted Data Detection Without Requiring Human Review

Bad data causes bad decisions, and finding it is tedious, manual work. Microsoft is patenting a system that hands that job to a generative AI.

Microsoft Patent: AI Checks Data Quality Automatically — figure from US 2026/0178551 A1
FIG. 1A — rendered from the official USPTO publication PDF.
Publication number US 2026/0178551 A1
Applicant Microsoft Technology Licensing, LLC
Filing date Feb 20, 2026
Publication date Jun 25, 2026
Inventors Victor Chukwuma DIBIA, Chenglong WANG, Bongshin LEE, Jeevana Priya INALA, John THOMPSON
CPC classification 707/692
Grant likelihood Low
Examiner CENTRAL, DOCKET (Art Unit OPAP)
Status Docketed New Case - Ready for Examination (Jun 3, 2026)
Parent application is a Continuation of 18374991 (filed 2023-09-29)
Document 21 claims

What Microsoft's automated data-health check actually does

Imagine you inherit a spreadsheet with tens of thousands of rows. Some cells are blank, some numbers are wildly out of range, dates are formatted three different ways, and no one is quite sure what half the columns mean. Catching all of that normally means hours of manual poking around.

Microsoft's patent describes a system where an AI reads a brief description of your dataset and then writes its own inspection plan: what to check, in what order, and how to flag problems. A second AI agent then runs that plan against the actual data and reports back what it found.

The two-agent setup, one for planning and one for executing, means the system can adapt its checks to whatever kind of data you hand it, rather than running a fixed checklist. You describe the data, the AI figures out what healthy looks like, and then it goes looking for anything that falls short.

How the two-agent planning and execution loop works

The patent describes a pipeline built around two specialized AI agents working in sequence.

The evaluation planning agent takes a prompt that includes context about the dataset (column names, data types, what the data is supposed to represent) and feeds it to a generative language model (an LLM, the same family of models behind tools like ChatGPT). The model responds with a structured data evaluation plan: a set of checks tailored to that specific dataset rather than a generic template.

The evaluation execution agent then runs those checks against the real data. It looks for issues like missing values, statistical outliers, inconsistent formatting, duplicate records, or values that don't match expected ranges. These are collectively called data health issues in the patent.

The key architectural choice is separating planning from execution. This lets the system reason about what to check before it touches the data, which means the checks themselves can be more contextually appropriate. A dataset of patient ages needs different sanity checks than a dataset of stock prices.

One important note: the first independent claim (claims 1 through 20) has been canceled in this publication, which is common during patent prosecution and does not necessarily reflect the final scope of protection sought.

What this means for analysts drowning in bad data

Data quality is one of the most unglamorous and most consequential problems in any organization that runs on data, which is most of them. Analysts routinely spend more time cleaning data than analyzing it. A system that automatically generates a tailored inspection plan and then executes it could meaningfully cut that overhead, particularly for teams dealing with unfamiliar or frequently changing datasets.

For Microsoft specifically, this fits neatly into the company's push to embed AI into its data and analytics products. Tools like Microsoft Fabric and Power BI are obvious homes for something like this. If the approach works in practice, you could see data-quality checks become something that happens automatically when a dataset is ingested, rather than something a data engineer has to manually build and maintain.

Editorial take

This is a practical, unglamorous patent aimed squarely at a real pain point. The two-agent design is a reasonable engineering choice, not a flashy one. Whether it works well in practice depends entirely on how reliably the LLM generates useful evaluation plans, which the patent doesn't fully answer. Worth watching as a signal of where Microsoft wants AI to sit in its data-platform stack.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.