New Google Patents · Filed Nov 18, 2025 · Published Jun 11, 2026 · verified — real USPTO data

Reading Body Language to Predict Pedestrian Movement

Before a pedestrian steps into the road, their body often gives it away — a turned shoulder, a weight shift, a glance. Waymo is now patenting a system that lets its self-driving cars read those physical cues in 3D, in real time.

Waymo Patent: Predicting Pedestrian Behavior Using 3D Body Poses — figure from US 2026/0159135 A1
FIG. 1A — rendered from the official USPTO publication PDF.
Publication number US 2026/0159135 A1
Applicant Waymo LLC
Filing date Nov 18, 2025
Publication date Jun 11, 2026
Inventors Xinwei Shi, Tian Lan, Jonathan Chandler Stroud, Zhishuai Zhang, Junhua Mao, Jeonhyung Kang, Khaled Refaat, Jiachen Li
CPC classification 701/23
Grant likelihood Medium
Examiner CENTRAL, DOCKET (Art Unit OPAP)
Status Docketed New Case - Ready for Examination (Mar 6, 2026)
Parent application is a Continuation of 17988701 (filed 2022-11-16)
Document 21 claims

How Waymo's car reads a pedestrian's body pose

Imagine a person waiting at a crosswalk, not yet moving but leaning forward and angling their feet toward the street. That body language tells you something is about to happen — even before their foot leaves the curb. Waymo's patent is about teaching a self-driving car to pick up on exactly that kind of physical signal.

The system tracks 3D body keypoints — essentially a skeleton map of joints like shoulders, hips, knees, and feet — for every nearby pedestrian. It then combines that pose information with a record of where the person has been moving over the past few seconds. Together, those two data streams give the car a richer picture of what the pedestrian is likely to do next.

The goal is to make the car's predictions more accurate earlier, so it can slow down, yield, or steer away before a situation becomes dangerous — not just react after someone steps into traffic.

How the neural networks fuse pose and movement history

The patent describes a machine-learning pipeline with three main stages working in parallel before merging into a final prediction.

  • Context encoder: A neural network processes the movement history of all nearby agents — pedestrians, cyclists, vehicles — up to the current moment, producing a summary of the scene's dynamics for the target pedestrian.
  • Keypoint encoder: A separate neural network processes the 3D keypoint data for that same pedestrian. Keypoints are a skeletal representation: a set of coordinates marking major joints. In 3D, that means the system knows not just where someone is in the horizontal plane but also how they're oriented in space — whether they're crouching, turning, leaning forward.
  • Combined decoder: The outputs of those two encoders are merged into a single combined embedding (think of it as a compressed numerical fingerprint of everything known about that person right now). A decoder network then reads that fingerprint and outputs a behavior prediction — essentially a probability map of where the pedestrian is likely to be at future time steps.

The key insight is that trajectory history alone misses short-term intent signals. Someone can be standing perfectly still while their body is already oriented to cross — the keypoint encoder catches that; the context encoder alone would not.

What better pedestrian prediction means for self-driving safety

Pedestrian prediction is one of the hardest problems in autonomous driving. People don't follow lane markings, they change their minds mid-step, and they interact with each other in ways that are hard to model from GPS traces alone. By adding body pose as an independent input channel, Waymo is essentially giving its vehicles a sense closer to what a human driver does intuitively — reading body language at a glance. That matters most in dense urban scenarios: school crossings, busy intersections, parking lots.

For you as a potential passenger or pedestrian near a Waymo vehicle, the practical upside is a car that yields earlier and brakes more smoothly — because it anticipated your move instead of reacting to it. It also reduces the chance of a false-positive emergency stop triggered by someone who merely glanced at the road but wasn't actually crossing.

Editorial take

This is genuinely substantive work, not a defensive filing. Adding a dedicated 3D-pose encoder as a parallel input stream — rather than just hoping trajectory history captures intent — is a principled architectural choice that addresses a real gap in current prediction models. Waymo has been publishing research in this direction for a few years, so this patent reflects an active engineering bet, not speculative IP hoarding.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.