Apple · Filed Jan 7, 2026 · Published May 14, 2026 · verified — real USPTO data

Apple Patents a Smart Video Call System That Adds 3D Depth on Vision Pro

When you're on a video call, your iPhone and your Vision Pro are very different screens — so why should they show the exact same thing? Apple's latest patent tackles exactly that mismatch.

Apple Patent: 3D Depth in Video Calls for Vision Pro — figure from US 2026/0134620 A1
FIG. 1A — rendered from the official USPTO publication PDF.
Publication number US 2026/0134620 A1
Applicant Apple Inc.
Filing date Jan 7, 2026
Publication date May 14, 2026
Inventors Leanid Vouk
CPC classification 345/419
Grant likelihood Medium
Examiner CENTRAL, DOCKET (Art Unit OPAP)
Status Docketed New Case - Ready for Examination (Feb 4, 2026)
Parent application is a Continuation of 18369517 (filed 2023-09-18)
Document 24 claims

What Apple's 2D/3D video call switching actually does

Imagine you're on a group FaceTime call. Your friend is watching on their laptop, but you're watching on a spatial computing headset. Today, you both see the same flat video. This patent describes a system where the sender's device streams both regular 2D video and extra 3D depth data at the same time — and the receiving device decides what to do with it.

If you're on a plain phone or laptop, your device just plays the normal flat video and ignores the 3D data entirely. But if you're wearing a headset like Vision Pro, your device uses that depth data to make the other person look like they have real spatial presence — not just a flat rectangle floating in your room.

There's even a third mode where the person is visually separated from their background, like a live cutout — think of it as background blur taken to its logical extreme. One stream, three possible experiences depending on what screen you're using.

How the receiving device picks its presentation mode

The patent describes a communication architecture where the sending device captures a standard 2D video stream via its camera and simultaneously captures 3D sensor data — likely depth map information from a LiDAR scanner or structured light system — and transmits both during a multi-user session.

On the receiving side, the device evaluates a criterion (most likely whether it has stereo-display hardware) and picks one of three presentation modes:

  • First mode (flat 2D): The device just plays the video stream normally, ignoring the 3D data entirely. This is what a laptop or smartphone would do.
  • Second mode (depth-enhanced): The device uses the 3D data to reconstruct a sense of depth around the person or object in the video, making them appear more volumetric. This targets stereo-capable headsets.
  • Third mode (object separation): The system isolates the subject from the rest of the video frame — essentially real-time segmentation — and presents them separately from the background.

The key insight is that the 3D data travels alongside the 2D stream unconditionally; the intelligence about what to render lives entirely on the receiving device. This keeps the system backward-compatible — older devices just never touch the extra data.

What this means for FaceTime on Vision Pro and beyond

For Apple, this is infrastructure work for a future where Vision Pro and iPhones share the same FaceTime calls but deliver different experiences. Right now, spatial personas in FaceTime require the other person to also be on a spatial device. A system like this could let a Vision Pro user see a 3D-depth version of someone calling from a regular iPhone — no special hardware on the sender's end required.

The object separation mode is also worth watching. Isolating a person from their background in real time, as a first-class presentation mode rather than a software filter, hints at deeper integration with AR overlays and virtual environments — places where you'd want a person to feel like they're inside your space, not pasted onto a flat screen.

Editorial take

This is quiet but meaningful plumbing for Apple's spatial computing roadmap. It solves a real interoperability headache — how do you make video calls feel spatial on a headset when most callers are still on flat screens? — without requiring everyone to upgrade. The three-mode design is elegant: send everything, render what makes sense. Worth watching as Vision Pro software matures.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.