New Google Patents · Filed Jun 2, 2025 · Published Jun 11, 2026 · verified — real USPTO data

Google Patents a System That Generates Images Live During Your Video Calls

Imagine someone on your video call mentions the Eiffel Tower, and a photo of it instantly appears on screen — no screen-sharing, no copy-pasting a link. That's the core idea in Google's latest patent.

Google Patent: AI-Generated Images in Video Calls — figure from US 2026/0162327 A1
FIG. 1A — rendered from the official USPTO publication PDF.
Publication number US 2026/0162327 A1
Applicant Google LLC
Filing date Jun 2, 2025
Publication date Jun 11, 2026
Inventors Ryan FEDYK, Anton VOLKOV
CPC classification 715/716
Grant likelihood Medium
Examiner CENTRAL, DOCKET (Art Unit OPAP)
Status Docketed New Case - Ready for Examination (Mar 20, 2026)
Parent application is a National Stage Entry of PCTUS2023030730 (filed 2023-08-21)
Document 20 claims

What Google's live video-call image generator actually does

Picture this: you're on a Google Meet call and a colleague brings up a new product concept, a city, or a scientific term you've never heard of. Normally you'd tab out, Google it, and lose the thread of the conversation. This patent describes a system that does that lookup for you, right inside the call.

Here's how it works from your perspective: the app listens to what's being said, picks out the important people, places, or things being discussed, and uses AI to generate a relevant image. That image then pops up automatically inside the video call window — no one has to lift a finger.

Think of it like a visual autocomplete for conversation. The system doesn't just search for an existing photo — it generates one from scratch using the kind of AI that powers tools like DALL·E or Imagen, Google's own image-generation technology.

How speech becomes a visual in real time

The patent describes a two-stage AI pipeline triggered by live audio in a video call.

Stage 1 — Speech to text prompt: The app first transcribes the spoken audio into text (standard speech-to-text). That transcript is then fed into a text-generation model (think a large language model, or LLM) whose job is to identify a key entity — a specific person, place, object, or concept mentioned in the conversation — and craft a short image-generation prompt around it.

Stage 2 — Text prompt to image: That prompt is handed off to a separate image-generation model, which produces a brand-new visual based on what was described. The resulting image is then surfaced directly inside the video communication session for participants to see.

The claim is deliberately broad. It covers:

  • Any video communication session (calls, meetings, conferences)
  • Any text-generation model outputting the interim prompt
  • Any image-generation model producing the final visual
  • Automatic display of the result within the session UI

Notably, the patent doesn't specify when the image appears (e.g., mid-sentence vs. end-of-turn), how participants control it, or whether it's opt-in — those design questions are left open.

What this means for the future of Google Meet

Google Meet is in a crowded market alongside Zoom and Microsoft Teams, and all three are racing to add AI features that feel genuinely useful rather than gimmicky. A system that passively enriches conversations with real-time visuals could make remote meetings feel more like an in-person whiteboard session — especially for education, sales demos, or cross-language calls where a picture really does replace a thousand words.

For you as a user, the upside is obvious: less context-switching, more visual clarity. The downside risk — also obvious — is a meeting UI cluttered with AI-generated images no one asked for. How Google gates this feature (automatic vs. manual, presenter-only vs. all participants) will matter enormously.

Editorial take

This is a genuinely practical idea, not just a tech demo dressed up as a patent. The two-model pipeline — LLM extracts the entity, image model visualizes it — is a clean design that maps onto infrastructure Google already operates. The real question is UX, not capability: auto-popping images in a business meeting could easily become annoying. But as an opt-in feature, it's the kind of thing that would actually get used.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.