Qualcomm · Filed Dec 17, 2024 · Published Jun 18, 2026 · verified — real USPTO data

Qualcomm Patents a Way to Fuse Camera and Depth Sensor Data for AI Scene Reading

By Patentlyze Team · Updated Jun 19, 2026

Your phone's camera sees color and light, but it doesn't naturally know how far away things are. Qualcomm's new patent describes a way to combine that camera image with depth sensor data so an AI can perceive a scene the way your two eyes do — with real spatial understanding.

FIG. 1A — rendered from the official USPTO publication PDF.

Publication number US 2026/0170655 A1

Applicant QUALCOMM Incorporated

Filing date Dec 17, 2024

Publication date Jun 18, 2026

Inventors Diptiben Navinchandra Patel, Madhumitha Sakthi, Amin Ansari, Sai Madhuraj Jadhav

CPC classification 382/173

Grant likelihood Medium

Examiner CENTRAL, DOCKET (Art Unit OPAP)

Status Docketed New Case - Ready for Examination (Jan 31, 2025)

Document 20 claims

AI/ML

How Qualcomm's camera-plus-depth AI actually sees a scene

Imagine you're trying to spot a pedestrian on a busy street. A regular camera gives you a flat picture — you can see colors and shapes, but the camera alone doesn't know whether that person is two feet away or twenty. A depth sensor, like the kind found in some phones or AR headsets, fills in that gap by measuring distance to everything in view.

Qualcomm's patent describes a chip-level system that takes both inputs — the color image and the depth data — and feeds them through separate analysis pipelines before combining the results. The twist is that the depth information also tells the system where in the camera image to zoom in and look more carefully. Those focused regions get their own third round of analysis before everything is merged.

The payoff is a richer, more accurate understanding of a scene than any single sensor could produce on its own. That kind of perception is useful anywhere a device needs to recognize objects, track movement, or avoid obstacles — from a robot arm to an autonomous vehicle to a next-generation phone camera.

How the three feature streams get merged into one perception signal

The patent describes a processing pipeline that runs on a single chip and handles inputs from two different sensors simultaneously: a standard camera and a depth sensor (a device — like a LiDAR unit or a structured-light sensor — that measures how far away each point in a scene is, rather than just its color).

Here's how the pipeline works:

Camera features: The color image goes through a neural network called a feature extractor, which finds edges, textures, shapes, and objects.
Depth features: The depth data runs through its own separate feature extractor tuned to 3D distance information.
ROI features (Region of Interest): The depth map is also used to carve the camera image into a grid of subregions — essentially pointing the system toward the most spatially meaningful patches of the image. Those patches are then re-analyzed by the same camera feature extractor for a closer look.

All three outputs are then merged into combined features that a downstream AI model uses to complete a perception task — which could be detecting objects, classifying them, tracking their movement, or estimating their 3D position.

The depth data plays a dual role: it's both an independent data source and a guide that directs the camera analysis toward the right parts of the image. That's the core design choice the patent is protecting.

What this means for phones, robots, and self-driving systems

Most consumer devices today either rely on a camera alone or use depth sensors in limited, single-purpose ways (Face ID is a famous example). Qualcomm's approach is aimed at edge devices — hardware that processes data locally, without sending it to the cloud — which is exactly the kind of chip Qualcomm makes for phones, cars, and robotics platforms. Getting two sensors to work together this tightly, at low power, on a single chip, is a meaningful engineering target.

For you as a user, this could translate into phones that understand depth when scanning a room, AR glasses that place virtual objects more accurately, or driver-assistance systems that more reliably identify what's close versus far. The patent is written broadly enough to apply to any device that pairs a camera with a depth sensor — which is an increasingly common hardware combination across consumer and industrial products.

Editorial take

This is solid, focused engineering work rather than a headline-grabbing idea. Sensor fusion — making multiple data streams work together — is one of the genuinely hard problems in on-device AI, and Qualcomm has a real business reason to patent this: their Snapdragon chips power a lot of the hardware where this would run. Whether or not this specific pipeline ships in a product, it signals that Qualcomm is investing seriously in the perception stack for its edge AI platforms.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.

Qualcomm Patents a Way to Fuse Camera and Depth Sensor Data for AI Scene Reading

How Qualcomm's camera-plus-depth AI actually sees a scene

How the three feature streams get merged into one perception signal

What this means for phones, robots, and self-driving systems

More from Qualcomm

More in AI/ML

Get one Big Tech patent every Sunday