Microsoft · Filed Feb 19, 2026 · Published Jun 25, 2026 · verified — real USPTO data

Microsoft Patent Splits AI Model Layers Across Chips by Their Strengths

By Patentlyze Team · Updated Jun 26, 2026

Running a massive AI model is expensive partly because today's systems don't always match workloads to the right hardware. Microsoft has filed a patent for a system that automatically routes different parts of an AI model to chips that are purpose-built for each job.

FIG. 1A — rendered from the official USPTO publication PDF.

Publication number US 2026/0178987 A1

Applicant Microsoft Technology Licensing, LLC

Filing date Feb 19, 2026

Publication date Jun 25, 2026

Inventors Devangkumar Rameshbhai PATEL, Wei ZUO, Yuan YU

CPC classification 706/12

Grant likelihood Medium

Examiner CENTRAL, DOCKET (Art Unit OPAP)

Status Docketed New Case - Ready for Examination (Mar 22, 2026)

Parent application is a Continuation of 17848679 (filed 2022-06-24)

Document 20 claims

AI/ML

What Microsoft's chip-splitting AI system actually does

Imagine your kitchen had two appliances: a big slow-cooker with plenty of space and a small high-powered blender. You wouldn't cook a roast in the blender. Microsoft's patent applies that same logic to AI chips.

Modern AI models contain two types of work. Some parts (called dense layers) involve lots of rapid calculation across every piece of data. Other parts (called sparse layers, common in a design known as Mixture-of-Experts) mostly need to hold large lookup tables in memory and read from them. These two jobs need very different hardware.

The patent describes a system that automatically assigns each type of work to the chip category that suits it best: high-memory chips for the lookup-heavy sparse parts, and high-speed processors for the calculation-heavy dense parts. The two sets of chips then pass data back and forth as the model runs.

How dense and sparse layers get routed to different hardware

The patent covers a software-managed computing system that distributes the layers of a Mixture-of-Experts (MoE) model (an AI architecture that activates only a small subset of its internal "experts" for any given input, keeping most parameters idle) across two distinct classes of accelerator chips.

First set of accelerators (memory-heavy): These chips have large memory capacity but lower raw processing speed. They host the model's sparse layers, the parts of the MoE that store many specialist sub-networks and selectively load them.
Second set of accelerators (compute-heavy): These chips have high processing throughput but smaller memory. They run the model's dense layers, where every parameter is used on every input and throughput matters most.

The key coordination step: the system is designed so that a sparse-layer chip can receive multiple simultaneous inputs from several dense-layer chips. That fan-in design keeps the fast chips from sitting idle while waiting for the memory-heavy chips to respond.

The patent does not tie the system to a specific chip brand or product. It defines the architecture at the software instruction level, meaning it could theoretically run on any combination of accelerators that fits the memory-vs-compute tradeoff.

What this means for running large AI models cheaply

Running large MoE-style AI models (the architecture behind several frontier models) today often means buying a uniform cluster of expensive, high-memory chips even though parts of the model would run fine on cheaper, faster hardware. A system that lets operators mix chip types could cut infrastructure costs significantly, or let the same budget support a larger model.

For Microsoft, which runs Azure's AI infrastructure and hosts OpenAI models, even modest efficiency gains at scale translate to real money. This patent suggests the company is building out the plumbing to run next-generation models on heterogeneous hardware fleets rather than uniform, premium-chip clusters.

Editorial take

This is unglamorous infrastructure work, but it's the kind that pays off at cloud scale. If Microsoft can run MoE models on mixed hardware without a performance penalty, that's a meaningful cost advantage over competitors locked into homogeneous GPU clusters. It won't make headlines the way a new AI feature does, but it's worth watching.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.

Microsoft Patent Splits AI Model Layers Across Chips by Their Strengths

What Microsoft's chip-splitting AI system actually does

How dense and sparse layers get routed to different hardware

What this means for running large AI models cheaply

More from Microsoft

More in AI/ML

Get one Big Tech patent every Sunday