Waymo Patents an Efficient Attention Neural Network for Predicting Agent Trajectories
Predicting where every car, cyclist, and pedestrian around you will be in the next five seconds is one of the hardest problems in autonomous driving — and Waymo just filed a patent on a more computationally efficient way to do it.
What Waymo's trajectory prediction system actually does
Imagine a busy four-way intersection: cars inching forward, a cyclist cutting across, a pedestrian stepping off the curb. A self-driving car needs to predict what each of those agents will do next — not just one at a time, but all of them, together, in real time.
Waymo's patent describes a neural network system that takes in a rich snapshot of the current scene — including road geometry, the recent movements of nearby agents, and other contextual cues — and generates a predicted future path for each target agent. The key twist is that it uses an efficient attention mechanism, meaning the system can reason about how all the agents relate to each other without the computational cost blowing up as the scene gets more crowded.
The system uses something called learned seeds fed into a trajectory decoder, which helps it generate multiple plausible future paths rather than just one. That's important: in the real world, a car at a junction might turn left or go straight, and a good prediction system needs to account for both possibilities.
How the encoder and decoder process multi-modal scene data
The patent describes a system that takes scene context data — a structured snapshot of the environment at a given moment — and encodes it into a compact representation that a decoder network then uses to output trajectory predictions.
The scene context is multi-modal, meaning it pulls from several different types of input at once: the positions and velocities of nearby agents (other cars, pedestrians, cyclists), static map features like lane boundaries and crosswalks, and potentially traffic signal states. Each modality gets its own encoding pathway before being fused together.
The architectural centerpiece is an attention mechanism (the same class of computation used in Transformer models like GPT — it lets the network figure out which parts of the scene are most relevant to a given prediction). The patent's efficiency angle is about making this attention step scale better when there are many agents in the scene, since naive attention grows quadratically with the number of inputs.
A trajectory decoder takes the encoded scene representation along with learned seeds — trainable starting points that help the model generate a diverse set of plausible futures — and outputs a distribution of predicted trajectories. This means the system doesn't just say "the pedestrian will walk forward"; it says "here are the top-K likely paths, with associated probabilities."
What this means for Waymo's real-time driving decisions
For a robotaxi service like Waymo's, trajectory prediction is foundational. Every downstream decision — whether to brake, yield, or proceed — depends on how confident the car is about what the agents around it are about to do. A prediction system that can handle dense, chaotic scenes (think a busy downtown crosswalk or a highway merge) without becoming computationally expensive is directly valuable to real-world deployment.
The efficiency angle is particularly worth noting. Waymo runs these models on onboard hardware with real latency constraints — not in a data center. A more efficient attention design means the same hardware can handle more complex scenes, or the same scenes at lower power draw. That's a quiet but meaningful engineering win for a company trying to scale a commercial fleet.
This is squarely in Waymo's wheelhouse — trajectory prediction is one of the core technical moats that separates leading autonomous vehicle companies from the rest. The efficiency framing suggests this is targeted at real deployment constraints, not a research demo. It's not a flashy consumer-facing patent, but it's the kind of infrastructure work that actually makes robotaxis viable at scale.
Get one Big Tech patent every Sunday
Plain English, intelligent commentary, no hype. Free.
Editorial commentary on a publicly published patent application. Not legal advice.