Adobe · Filed Nov 21, 2024 · Published May 21, 2026 · verified — real USPTO data

Adobe Patents a Condition-Map System for Controlling AI Image Structure

By Patentlyze Team · Updated Jul 10, 2026

Adobe is patenting a way to give AI image generators a spatial blueprint — a 'condition map' — so the model produces images that match a specific layout or structure, not just a text description. It's the difference between asking for 'a cityscape' and handing the model a rough sketch of exactly where the buildings should go.

Figure from the official USPTO publication.

Publication number US 2026/0141594 A1

Applicant ADOBE INC.

Filing date Nov 21, 2024

Publication date May 21, 2026

Inventors Tristan von Busch, Tobias Hinz

CPC classification 345/619

Grant likelihood Medium

Examiner WELCH, DAVID T (Art Unit 2613)

Status Docketed New Case - Ready for Examination (Dec 20, 2024)

Document 20 claims

AI image & video

What Adobe's condition-map image control actually does

Imagine you're designing a magazine spread and you need an AI-generated photo that fits a very specific layout — sky in the top third, a figure on the left, open space on the right. Just typing a text prompt rarely gets you there reliably. Adobe's patent describes a system that lets you feed the AI a condition map — a spatial diagram of where things should appear in the final image — and the model actually follows it.

Instead of hoping a text prompt nudges the AI in the right direction, you provide a structured guide that encodes your intended image layout. The AI generation model reads that guide, converts it into a sequence of tokens (think of tokens as the building blocks the model works with), and uses them to steer the output image toward your intended structure.

For creative professionals, this is about predictability and control. You're not rolling the dice on a prompt — you're giving the system a map and saying, 'make something that looks like this arrangement, but filled in with real imagery.'

How the condition encoder guides Adobe's transformer output

The patent describes a pipeline with three core stages working together inside a transformer-based image generation model.

Condition map input: The user or system provides a spatial representation of a target image structure — essentially a layout map that encodes where objects, regions, or compositional elements should appear. Think of it like a wireframe or a rough segmentation mask.
Condition encoder: A dedicated encoder module converts that condition map into a condition sequence of tokens — a representation the transformer can actually process alongside other inputs. This is the key architectural addition: rather than just appending a text description, the model gets a spatially-aware token sequence that captures layout information.
Transformer generation: The transformer (the main generative engine) takes both the condition sequence and a preliminary sequence of tokens drawn from a discrete codebook — a fixed vocabulary of visual building blocks — and generates an output token sequence. The discrete codebook approach is similar to what's used in VQ-VAE and VQGAN architectures, where images are represented as sequences of discrete codes rather than raw pixels.
Decoder output: A decoder converts the output token sequence back into a full synthetic image that reflects the spatial structure specified in the original condition map.

The result is a generative model that respects compositional intent, not just semantic content.

What this means for Firefly's generative layout tools

For anyone using Adobe Firefly or similar generative tools in a professional design workflow, structural control over AI outputs is one of the biggest unsolved pain points. Text prompts are powerful but imprecise — they can generate beautiful images that still land in completely wrong compositions. A condition-map approach would let designers, art directors, and content creators specify layout constraints up front, dramatically reducing iteration cycles.

This also positions Adobe to compete more directly with ControlNet-style conditioning (popularized in the open-source Stable Diffusion ecosystem), but applied to transformer-based architectures like the ones Firefly runs on. If this ships in a product, it could make generative fill and text-to-image tools significantly more useful for production-grade creative work rather than just exploration.

Editorial take

This is a genuinely useful patent that addresses a real workflow problem, not just a theoretical capability. Structural control over generative image output is something designers have been patching together with workarounds for years, and Adobe building it natively into a transformer architecture is a logical and practical move. The interesting signal here is that Adobe is investing in the infrastructure layer of creative control, not just surface-level prompt features.

Which company should we read for you?

We track 17 companies here. Pro is the same weekly breakdown for any company you choose, delivered privately. Type a name and we'll scope it and send you a quote.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.