Amazon · Filed Jan 13, 2026 · Published May 21, 2026 · verified — real USPTO data

Amazon Patents a System That Fuses Object Maps With Audio to Find the Real Sound Source

Your smart speaker is often listening to a reflection of your voice, not the real thing — Amazon's new patent is designed to fix exactly that by teaching devices to cross-reference what they hear with what they know about the room.

Amazon Patent: Sound Source Localization Fusion Explained — figure from US 2026/0141917 A1
FIG. 1A — rendered from the official USPTO publication PDF.
Publication number US 2026/0141917 A1
Applicant Amazon Technologies, Inc.
Filing date Jan 13, 2026
Publication date May 21, 2026
Inventors Borham Lee, Wai Chung Chu
CPC classification 704/200
Grant likelihood Medium
Examiner CENTRAL, DOCKET (Art Unit OPAP)
Status Docketed New Case - Ready for Examination (Feb 11, 2026)
Parent application is a Continuation of 18614923 (filed 2024-03-25)
Document 20 claims

How Amazon's microphone array stops chasing echoes

Imagine you say "Alexa" from across a kitchen, and your voice bounces off the tile wall before hitting the device's microphone. From the speaker's perspective, your voice is coming from two directions at once — the direct path from your mouth and a delayed reflection off the wall. Current systems can get confused about which one is real.

Amazon's patent describes a smarter approach: combine the audio directional data with a map of known objects in the room. If the system already knows there's a wall to the left, it can assign a low probability to any sound appearing to come from that wall, and a much higher probability to the open space where a person is likely standing.

The result is a fused "target likelihood estimate" — a single confidence score for every direction around the device. When your wake word fires, the device uses this combined map to pick the right audio track (the direct sound from you) and discard the reflections. It's the audio equivalent of knowing the room before you start listening.

How SSL data and object maps combine into one likelihood score

The patent describes a three-input fusion system sitting inside a voice-enabled device like a smart speaker or Echo-style product.

Input 1: SSL data (Sound Source Localization — the microphone array's best guess at where a sound is coming from based on timing and phase differences between microphones). This gives a direction, but it can't tell a real speaker from a reflection on its own.

Input 2: Object information — the device builds or receives a map of the nearby environment. This can come from floorplan estimation, distance sensors, cameras, or prior knowledge. Objects like walls, furniture, or corners are tagged with locations.

Input 3: A detected acoustic event — typically a wake word, but the patent is broader. When an event fires within a time window, the system selects the relevant slice of SSL data.

The core logic:

  • Calculate a first likelihood estimate from the SSL track (directional audio probability map)
  • Calculate a second likelihood estimate from the object map — locations with known hard surfaces get low scores, open spaces get higher ones
  • Fuse both into a combined target likelihood estimate and use it to associate the acoustic event with the most plausible real-world direction

The goal is accurate direct-sound selection — identifying the straight-line audio path from a human speaker while rejecting multipath reflections caused by the environment.

What this means for wake-word accuracy in noisy rooms

Echo and reflection problems are one of the biggest reasons voice assistants mishear commands or falsely trigger in reverberant rooms — kitchens, bathrooms, and open-plan offices are notorious for this. By baking room geometry awareness into the wake-word pipeline, Amazon could meaningfully reduce false positives and improve directional beamforming accuracy without requiring the user to do anything differently.

For you as a user, this could mean fewer "I didn't catch that" responses when you're not standing directly in front of a device, and fewer phantom triggers when someone's voice bounces off a hard surface. It also suggests Amazon is investing in on-device spatial awareness — the kind of infrastructure that makes future multi-room or ambient computing scenarios much more reliable.

Editorial take

This is genuinely useful signal processing work, not a paper patent. The reflection problem is real and well-documented, and fusing object-map priors with SSL data is a principled engineering solution. The fact that it applies to wake-word detection specifically — Amazon's most commercially sensitive audio pipeline — suggests this is heading toward a real product update rather than a shelf.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.