New Google Patents · Filed Dec 18, 2025 · Published Jun 18, 2026 · verified — real USPTO data

Google's New Patent Turns a Flat Photo Into a 3D Scene You Can Hear

Google is patenting a system that looks at an ordinary photo, rebuilds it as a 3D scene, and then figures out what sounds you'd hear — and from which direction — based on exactly where you're standing inside that scene.

Google Patent: Spatial Audio Generated from 2D Images — figure from US 2026/0169680 A1
FIG. 1A — rendered from the official USPTO publication PDF.
Publication number US 2026/0169680 A1
Applicant GOOGLE LLC
Filing date Dec 18, 2025
Publication date Jun 18, 2026
Inventors Kathleen Alexandra Bryan, Shiblee Hasan
CPC classification 345/419
Grant likelihood Medium
Examiner CENTRAL, DOCKET (Art Unit OPAP)
Status Docketed New Case - Ready for Examination (Jan 21, 2026)
Parent application Claims priority from a provisional application 63735744 (filed 2024-12-18)
Document 20 claims

What Google's photo-to-3D audio system actually does

Imagine handing someone a postcard of a busy street corner and they immediately know what it would sound like to stand there — the traffic to your left, a café behind you, birds above. That's roughly the idea behind this Google patent.

The system takes a regular flat photo, reconstructs it as a 3D environment, then works out what audio belongs in that scene and how it should be positioned relative to a viewer's angle. So if you tilt or navigate around the scene, the sounds shift accordingly — a passing car moves from right to left as you turn.

Spatial audio — sound that feels like it's coming from specific directions — is already common in headphones and VR headsets. What's new here is automatically generating that directional audio from a still image, rather than requiring a dedicated audio recording or a fully produced virtual reality scene.

How Google maps sound to a reconstructed 3D scene

The patent describes a four-step pipeline:

  • Scene reconstruction: A flat 2D image (a photo or frame) is converted into a 3D scene — a depth map or similar spatial model that gives objects a sense of distance and position.
  • Image generation: The system generates a view of that 3D scene from a chosen camera angle or perspective, similar to how photo apps already offer "cinematic" or parallax effects.
  • Audio identification: Using visual analysis of the original image, the system identifies what audio content belongs in the scene. If the photo shows a waterfall, street traffic, or a concert crowd, it infers appropriate sounds.
  • Spatialization: The identified audio is then positioned in 3D space (a process called spatialization — placing sounds at specific angles and distances relative to the listener) based on where objects appear in the reconstructed scene and the viewer's current vantage point.

Critically, the audio perspective updates with the viewer's position. If you're looking at the scene from the left side versus straight on, the sound sources shift accordingly. The patent doesn't specify whether audio identification uses a trained AI model or a content database, though AI-driven audio inference is the obvious candidate given Google's broader toolset.

What this means for Google Photos and immersive media

This kind of technology has obvious applications for Google Photos, Google Street View, or any platform where static images are the primary content format. Rather than needing a film crew to capture spatial audio alongside video, the system could generate plausible directional sound automatically — a meaningful step toward making ordinary photos feel genuinely immersive without requiring specialized hardware to capture them.

For AR and VR specifically, automatically generating audio from images could lower the production barrier dramatically. Right now, building an immersive scene typically means recording audio separately and mixing it by hand. If Google can reliably infer and spatialize sound from a photo alone, that changes what a solo creator — or a consumer app — can produce. Whether the output audio is convincing enough to matter in practice is the real open question.

Editorial take

This is a genuinely interesting patent because it closes a gap that immersive media has always had: photos are everywhere, good spatial audio is rare and expensive to produce. Automatically inferring directional sound from a still image is a problem worth solving, and Google is in a strong position to do it given its investments in both computational photography and audio AI. The idea is clear; the hard part is whether the inferred audio is good enough to hold up under scrutiny.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.