Google · Filed Oct 24, 2025 · Published May 28, 2026 · verified — real USPTO data

Google Patents a Cascaded Diffusion Pipeline for Text-to-Image Generation

By Patentlyze Team · Updated May 29, 2026

Instead of generating a final image in one shot, Google's patent describes a relay race of diffusion models — each one handed a low-res draft and told to make it sharper. It's the architectural backbone behind some of the most capable text-to-image systems around.

FIG. 1A — rendered from the official USPTO publication PDF.

Publication number US 2026/0148449 A1

Applicant Google LLC

Filing date Oct 24, 2025

Publication date May 28, 2026

Inventors Chitwan Saharia, William Chan, Mohammad Norouzi, Saurabh Saxena, Yi Li, Jay Ha Whang, David James Fleet, Jonathan Ho

CPC classification 345/428

Grant likelihood Medium

Examiner CENTRAL, DOCKET (Art Unit OPAP)

Status Docketed New Case - Ready for Examination (Feb 18, 2026)

Parent application is a Continuation of 18624960 (filed 2024-04-02)

Document 20 claims

AI/ML

How Google's stacked image generators build up detail

Imagine asking someone to paint a portrait by first doing a rough charcoal sketch, then handing it to a second artist who adds color and detail, then a third who sharpens every edge. Google's patent describes exactly that kind of assembly-line process — but for AI-generated images.

You type a text prompt, and rather than one model doing all the work at once, a sequence of neural networks takes over in stages. The first network produces a small, rough image. Each network after that receives that draft and outputs a higher-resolution version with more detail. A final network polishes the result into the full output you see.

This approach lets each model in the chain specialize — the early ones focus on getting the composition and content right at low cost, while the later ones concentrate on fine-grained detail and resolution. The result is a system that can produce high-quality images without any single model carrying the entire burden.

How each diffusion stage hands off to the next

The patent covers a method for cascaded image generation using a sequence of diffusion-based neural networks (models that learn to progressively remove noise from an image until something coherent emerges).

Here's the pipeline step by step:

A text encoder converts your input prompt into contextual embeddings — dense numerical vectors that capture the meaning and relationships between words.
An initial diffusion network takes those embeddings and generates a low-resolution output image — essentially a small, rough draft that captures the scene's structure.
One or more subsequent diffusion networks each receive the previous network's output as input and produce a higher-resolution version, progressively upscaling while preserving semantic content.
A final neural network receives the last upscaled representation and produces the finished, full-resolution image.

The conditioning input (your text prompt) threads through the entire pipeline, keeping every stage anchored to what you asked for. The architecture is flexible — the patent covers conditioning inputs beyond text as well, suggesting the same cascade approach could work for image editing, inpainting, or other generative tasks.

The inventors — including Chitwan Saharia and Jonathan Ho, two of the key researchers behind the Imagen and DDPM lines of work — were doing this research at Google Brain, which has since merged into Google DeepMind.

What this means for Google's text-to-image products

This patent describes the core architecture behind Google's Imagen text-to-image system, one of the most cited research projects in the generative AI space. The cascaded diffusion approach is a deliberate design choice: by splitting the problem across multiple specialized networks, you get better sample quality at high resolutions without exponentially increasing the compute cost of any single model.

For you as a user, the practical implication is that systems built this way tend to produce images that are both semantically accurate (the right content) and visually sharp (high detail). The approach has influenced a generation of image generation tools, and Google filing a patent on it signals they want formal IP coverage over this pipeline as commercial text-to-image products become a real business.

Editorial take

This is a foundational patent on an architecture that already exists in the wild — Imagen has been publicly described in research papers since 2022, and the inventors are some of the most prominent names in diffusion model research. Filing patent protection on this now is a defensive move as much as an offensive one: Google is establishing IP claims over a technique that competitors are also building on. It's worth watching, but don't expect this to be a courtroom sword anytime soon.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.

Google Patents a Cascaded Diffusion Pipeline for Text-to-Image Generation

How Google's stacked image generators build up detail

How each diffusion stage hands off to the next

What this means for Google's text-to-image products

More from Google

More in AI/ML

Get one Big Tech patent every Sunday