Amazon · Filed Jan 16, 2026 · Published May 28, 2026 · verified — real USPTO data

Zoox Patents a Two-Stage Object Path Predictor for Autonomous Vehicles

Predicting where a pedestrian will walk is hard enough — predicting exactly when they'll be at each step is even harder. Zoox's new patent splits those two problems apart, letting specialized AI models tackle each one independently.

Zoox Patent: Diverse Object Path Prediction for AVs — figure from US 2026/0145710 A1
FIG. 1A — rendered from the official USPTO publication PDF.
Publication number US 2026/0145710 A1
Applicant Zoox, Inc.
Filing date Jan 16, 2026
Publication date May 28, 2026
Inventors Gregory Michael Woelki, Xiaosi Zeng, Gowtham Garimella, Samir Parikh, Ethan Miller Pronovost
CPC classification 701/26
Grant likelihood Medium
Examiner CENTRAL, DOCKET (Art Unit OPAP)
Status Docketed New Case - Ready for Examination (Feb 19, 2026)
Parent application is a Continuation of 18516618 (filed 2023-11-21)
Document 20 claims

How Zoox's AV separates 'where' from 'when' in predictions

Imagine a self-driving car watching a cyclist approach an intersection. The car needs to know: will the cyclist turn left, go straight, or stop? And crucially, how fast will they do it? Most prediction systems try to answer both questions at once — and that's where things go wrong.

Zoox's patent describes a system that deliberately separates those two questions. A first AI model figures out the possible spatial paths an object might take — essentially sketching out a menu of routes without worrying about timing. A second model then estimates how quickly the object will actually travel along whichever path it picks.

By splitting the job this way, each model can specialize. The result is that the system gets better at capturing unusual behavior — the cyclist who suddenly veers off or stops mid-block — because rare movement patterns no longer have to compete with timing noise in a single combined model.

How the two-model pipeline decouples paths from timing

The architecture has two main stages working in sequence.

The first machine-learned model takes in a top-down representation of the environment (think: a bird's-eye map built from sensor data) along with object detection data — position, velocity, heading, and so on. From this, it generates a set of time-invariant predicted paths. "Time-invariant" means each path is just a series of spatial positions the object might occupy, with no clock attached. The model produces multiple diverse paths simultaneously — not just the most likely route, but a spread of plausible alternatives.

The second model then takes those spatial paths and estimates progress in time along them — essentially answering: given that the object follows path A, how far along will it be at each future moment? This is where speed profiles and timing uncertainty get resolved.

The system then uses those predictions to:

  • Generate a candidate trajectory for the autonomous vehicle itself
  • Modify that trajectory into a planned trajectory by stress-testing it against the predicted object paths
  • Control the vehicle accordingly

A key design goal is capturing rare object behavior — the edge cases that trip up systems trained mostly on common, well-behaved movement patterns. Decoupling spatial diversity from temporal progression gives the first model more freedom to explore unlikely-but-possible paths.

Why splitting path and timing prediction helps edge cases

For autonomous vehicles, the hardest failures aren't caused by everyday pedestrians or well-signaled lane changes — they're caused by the weird stuff: the person who jaywalks at an odd angle, the car that stalls mid-intersection. Systems that predict position and timing jointly tend to smooth over those rare events because the training signal is dominated by normal behavior. Zoox's two-stage approach gives the spatial model room to output genuinely diverse paths without being penalized for timing inaccuracy.

This is particularly relevant for robotaxi deployments in dense urban environments — exactly where Zoox operates. If this architecture makes it into production, it could mean fewer uncomfortable hard-braking moments and more graceful handling of unpredictable human behavior around the vehicle.

Editorial take

This is solid, focused AV engineering work — not flashy, but the kind of architectural decision that separates safe edge-case handling from dangerous overconfidence. The insight that spatial diversity and temporal accuracy are better solved separately is genuinely useful, and the patent is specific enough that it reflects real systems-level thinking. Worth watching for anyone tracking how AV companies are hardening their prediction stacks.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.