Qualcomm · Filed Dec 13, 2024 · Published Jun 18, 2026 · verified — real USPTO data

Qualcomm Patents an Object Detector That Tracks How Scenes Change Over Time

By Patentlyze Team · Updated Jun 19, 2026

A camera that only looks at one frame at a time will always miss things that are just starting to move. Qualcomm's new patent describes a system that compares two moments in time and uses the difference between them — not just what's there, but what's changed — to find objects that a single snapshot would overlook.

FIG. 1A — rendered from the official USPTO publication PDF.

Publication number US 2026/0170845 A1

Applicant QUALCOMM Incorporated

Filing date Dec 13, 2024

Publication date Jun 18, 2026

Inventors Rahul AHUJA, Venkatraman NARAYANAN, Varun RAVI KUMAR, Senthil Kumar YOGAMANI

CPC classification 382/156

Grant likelihood Medium

Examiner CENTRAL, DOCKET (Art Unit OPAP)

Status Docketed New Case - Ready for Examination (Jan 23, 2025)

Document 20 claims

AI/ML

How Qualcomm's motion-aware object detection actually works

Imagine you're watching a parking lot through a security camera. A person standing perfectly still in a dark corner might not trigger an ordinary detector — nothing looks unusual in any single frame. But the moment they take a step, something moved, and a system watching two frames at once would catch it immediately.

That's the core idea behind this Qualcomm patent. The system takes two snapshots of a scene taken at slightly different moments and computes what's called a scene flow — essentially a map of what moved, where, and by how much. It then combines that motion map with its own guesses about where objects might be hiding, feeding everything into an AI model that produces a final list of detected objects.

The output is a bird's-eye view — a top-down map of the scene with a labeled box drawn around each object. That kind of overhead representation is especially useful in cars, drones, and robots, where knowing exactly where something is in 3D space matters as much as knowing it exists.

Inside the scene-flow encoder and transformer pipeline

The system ingests two sets of sensor data — one from an earlier moment, one from now — and processes them through three stages running roughly in parallel.

Stage 1 — Scene flow generation: A transformer-based feature flow encoder (a type of AI model that's good at spotting relationships across large amounts of data) compares the two snapshots and calculates a scene flow — a dense description of how features in the scene shifted between the two timeframes. Think of it like a wind map, but for objects.

Stage 2 — Object proposals: A separate object encoder looks at both snapshots and generates candidate guesses — object proposals — about where things might be. These are rough bounding boxes over groups of features that look like they could be objects.

Stage 3 — Fusion and decoding: An encoder-decoder transformer (an AI architecture that compresses information and then reconstructs a refined answer from it) fuses the motion map and the object proposals into a single unified scene representation. Its decoder then identifies the final list of objects — including ones the proposal stage missed — and draws a precise bounding box around each one inside a bird's-eye (top-down, overhead) view of the environment.

The key architectural choice is that motion data and appearance data are kept as separate streams until the fusion step, which lets each signal do what it's best at before being combined.

What this means for self-driving cars and on-device AI cameras

This kind of temporal, motion-aware detection is a known hard problem in autonomous driving and robotics. Cars using lidar and cameras need to detect pedestrians, cyclists, and other vehicles reliably — including ones that are partially occluded or just beginning to move. A system that explicitly models how the scene changed between two sensor readings is better positioned to catch edge cases than one working from a single frame.

Qualcomm already supplies the chips that power many in-car AI systems through its Snapdragon Ride platform. A patent like this suggests the company is pushing its detection stack further up the perception pipeline — from raw sensor processing toward the kind of scene understanding that self-driving and driver-assistance systems actually need to make decisions.

Editorial take

This is a real engineering contribution to a problem that genuinely limits today's perception systems — single-frame detectors miss things, and that matters when the thing being missed is a child stepping off a curb. The transformer-fusion approach here is methodologically sound and fits squarely into Qualcomm's automotive AI strategy. It's not flashy patent-filing for its own sake.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.

Qualcomm Patents an Object Detector That Tracks How Scenes Change Over Time

How Qualcomm's motion-aware object detection actually works

Inside the scene-flow encoder and transformer pipeline

What this means for self-driving cars and on-device AI cameras

More from Qualcomm

More in AI/ML

Get one Big Tech patent every Sunday