New Google Patents · Filed May 7, 2024 · Published Jun 18, 2026 · verified — real USPTO data

Google Patents an AI That Converts Photos Between Visual Styles Without Losing the Subject

By Patentlyze Team · Updated Jun 19, 2026

Google has patented a way to take an image from one visual world — say, a real photograph — and regenerate it in a completely different style, like a sketch or a painting, while making sure the subject stays recognizably the same thing.

FIG. 1A — rendered from the official USPTO publication PDF.

Publication number US 2026/0170719 A1

Applicant Google LLC

Filing date May 7, 2024

Publication date Jun 18, 2026

Inventors Andrey Voynov, Kfir Aberman, Daniel Cohen-Or

CPC classification 345/619

Grant likelihood Medium

Examiner CHIN, MICHELLE (Art Unit 2614)

Status Non Final Action Mailed (Apr 28, 2026)

Parent application is a National Stage Entry of PCTUS2023080932 (filed 2023-11-22)

Document 19 claims

AI/ML

What Google's cross-domain image converter actually does

Imagine you have a photo of a dog and you want to see what it would look like as a watercolor painting, or as a medical illustration, or in the style of a video game. The challenge is that AI image tools are often trained on one type of image and don't translate well to another without losing key details about the subject.

Google's patent describes a system that solves this by using a kind of internal checklist. While the AI is building the new image step-by-step, it keeps checking whether the spatial layout — where the dog's head is, where its legs are — still matches the original photo. If it drifts, the system nudges it back on course.

The result is an image that looks like it genuinely belongs in the new style, but where your original subject is still clearly what it is. You don't end up with a watercolor painting of some random dog — you get one of your dog.

How the spatial feature map guides each diffusion step

The patent describes a cross-domain image diffusion model — a system that takes an image from a "source domain" (for example, a real photograph) and generates a corresponding image in a "target domain" (for example, a line drawing, a medical scan rendering, or a synthetic 3D image).

The core mechanism relies on something called a latent spatial feature predictor. As the AI builds the output image through a series of incremental steps (each step called a "reverse diffusion time step" — think of it like sculpting by gradually removing noise from a blurry canvas), this predictor monitors where objects are positioned in the image being generated.

At each step, the system:

Produces a current spatial feature map — a behind-the-scenes diagram of where shapes and structures are located in the evolving image
Compares that map to a target spatial feature map extracted from the original input image
Measures how similar the two are, and adjusts the next step accordingly if they've drifted apart

An optional text prompt lets users specify what kind of object they're working with, giving the model extra context. The key innovation is that this spatial guidance happens inside each diffusion step, not bolted on afterward, meaning the correction is woven into how the image is built rather than applied as a filter.

What this means for AI image generation tools

Most AI image generators are good at creating images within one style but struggle when you ask them to faithfully translate an image across styles. The usual workarounds — fine-tuning models on paired examples, or using text prompts to try to describe the original — either require a lot of data or produce unreliable results. Google's approach builds the structural fidelity check directly into the generation loop, which could make cross-domain translation significantly more reliable.

For Google, this fits into a broader push to make its image AI tools more useful for creative and professional workflows — think converting product photos into illustrated assets, or adapting real-world images for use in games or training datasets. If this approach makes it into a consumer product like Google Photos or Imagen, it could give everyday users a much more controllable way to transform their images.

Editorial take

This is a genuinely interesting technical contribution — the idea of running a spatial consistency check at every diffusion step, rather than hoping the model figures it out from a prompt, is a sensible fix to a real problem. It's not flashy on the surface, but it addresses the gap between 'AI can generate cool images' and 'AI can reliably transform a specific image I care about.' Worth keeping an eye on as Google builds out its image generation stack.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.

Google Patents an AI That Converts Photos Between Visual Styles Without Losing the Subject

What Google's cross-domain image converter actually does

How the spatial feature map guides each diffusion step

What this means for AI image generation tools

More from New Google Patents

More in AI/ML

Get one Big Tech patent every Sunday