Google · Filed Sep 22, 2025 · Published May 28, 2026 · verified — real USPTO data

Waymo Patents an Efficient Attention Neural Network for Predicting Agent Trajectories

By Patentlyze Team · Updated May 29, 2026

Predicting where every car, cyclist, and pedestrian around you will be in the next five seconds is one of the hardest problems in autonomous driving — and Waymo just filed a patent on a more computationally efficient way to do it.

FIG. 1A — rendered from the official USPTO publication PDF.

Publication number US 2026/0145711 A1

Applicant Waymo LLC

Filing date Sep 22, 2025

Publication date May 28, 2026

Inventors Rami Al-Rfou, Nigamaa Nayakanti, Kratarth Goel, Aurick Qikun Zhou, Benjamin Sapp, Khaled Refaat

CPC classification 701/23

Grant likelihood Medium

Examiner CENTRAL, DOCKET (Art Unit OPAP)

Status Docketed New Case - Ready for Examination (Feb 23, 2026)

Parent application is a Continuation of 18335915 (filed 2023-06-15)

AI/ML

What Waymo's trajectory prediction system actually does

Imagine a busy four-way intersection: cars inching forward, a cyclist cutting across, a pedestrian stepping off the curb. A self-driving car needs to predict what each of those agents will do next — not just one at a time, but all of them, together, in real time.

Waymo's patent describes a neural network system that takes in a rich snapshot of the current scene — including road geometry, the recent movements of nearby agents, and other contextual cues — and generates a predicted future path for each target agent. The key twist is that it uses an efficient attention mechanism, meaning the system can reason about how all the agents relate to each other without the computational cost blowing up as the scene gets more crowded.

The system uses something called learned seeds fed into a trajectory decoder, which helps it generate multiple plausible future paths rather than just one. That's important: in the real world, a car at a junction might turn left or go straight, and a good prediction system needs to account for both possibilities.

How the encoder and decoder process multi-modal scene data

The patent describes a system that takes scene context data — a structured snapshot of the environment at a given moment — and encodes it into a compact representation that a decoder network then uses to output trajectory predictions.

The scene context is multi-modal, meaning it pulls from several different types of input at once: the positions and velocities of nearby agents (other cars, pedestrians, cyclists), static map features like lane boundaries and crosswalks, and potentially traffic signal states. Each modality gets its own encoding pathway before being fused together.

The architectural centerpiece is an attention mechanism (the same class of computation used in Transformer models like GPT — it lets the network figure out which parts of the scene are most relevant to a given prediction). The patent's efficiency angle is about making this attention step scale better when there are many agents in the scene, since naive attention grows quadratically with the number of inputs.

A trajectory decoder takes the encoded scene representation along with learned seeds — trainable starting points that help the model generate a diverse set of plausible futures — and outputs a distribution of predicted trajectories. This means the system doesn't just say "the pedestrian will walk forward"; it says "here are the top-K likely paths, with associated probabilities."

What this means for Waymo's real-time driving decisions

For a robotaxi service like Waymo's, trajectory prediction is foundational. Every downstream decision — whether to brake, yield, or proceed — depends on how confident the car is about what the agents around it are about to do. A prediction system that can handle dense, chaotic scenes (think a busy downtown crosswalk or a highway merge) without becoming computationally expensive is directly valuable to real-world deployment.

The efficiency angle is particularly worth noting. Waymo runs these models on onboard hardware with real latency constraints — not in a data center. A more efficient attention design means the same hardware can handle more complex scenes, or the same scenes at lower power draw. That's a quiet but meaningful engineering win for a company trying to scale a commercial fleet.

Editorial take

This is squarely in Waymo's wheelhouse — trajectory prediction is one of the core technical moats that separates leading autonomous vehicle companies from the rest. The efficiency framing suggests this is targeted at real deployment constraints, not a research demo. It's not a flashy consumer-facing patent, but it's the kind of infrastructure work that actually makes robotaxis viable at scale.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.

Waymo Patents an Efficient Attention Neural Network for Predicting Agent Trajectories

What Waymo's trajectory prediction system actually does

How the encoder and decoder process multi-modal scene data

What this means for Waymo's real-time driving decisions

More from Google

More in AI/ML

Get one Big Tech patent every Sunday