Google · Filed Oct 31, 2024 · Published Apr 30, 2026 · verified — real USPTO data

Google Patents a Voice Assistant That Adapts to Background Noise Using Mixture-of-Experts AI

By Patentlyze Team · Updated May 4, 2026

Google's latest patent describes a voice assistant that doesn't just hear you — it measures how noisy your environment is and picks a specialized AI model trained for exactly that level of chaos. It's a fundamentally different approach to handling bad audio than simply trying harder to understand you.

FIG. 1A — rendered from the official USPTO publication PDF.

Publication number US 2026/0120681 A1

Applicant GOOGLE LLC

Filing date Oct 31, 2024

Publication date Apr 30, 2026

Inventors Dongeek Shin

CPC classification 704/200

Grant likelihood Medium

Examiner AZAD, ABUL K (Art Unit 2656)

Status Docketed New Case - Ready for Examination (Dec 10, 2024)

AI/ML

How Google's noise-aware voice assistant actually works

Imagine you're asking your voice assistant something while standing next to a running dishwasher. The audio it receives is garbled, and today's assistants often stumble — either mishearing you or giving a weirdly confident wrong answer. Google's patented system takes a different approach: instead of one AI trying to handle every audio scenario, it uses a team of specialized AIs, each trained on a different level of background noise.

Think of it like a hospital emergency room with triage. When your voice command arrives, the system first figures out how noisy your audio is, then routes your request to the specialist best suited for that exact noise level. A sub-model trained on crystal-clear audio handles your quiet-room questions; a different sub-model — trained on chaotic, noisy recordings — handles the dishwasher scenario.

The system blends the outputs of the most relevant specialists using calculated weights, so you're not getting a single expert's guess but a weighted consensus. The goal is a response that's more accurate and contextually appropriate regardless of where you are.

How the gating network picks the right expert for noisy audio

The patent describes a Mixture-of-Experts (MoE) neural network architecture plugged into a voice assistant pipeline. MoE is a design pattern where instead of one large model doing everything, you have multiple smaller expert models — each fine-tuned on a specific type of input — plus a gating network that decides which experts to consult.

Here's the specific flow:

The system receives audio and runs it through an ASR (Automatic Speech Recognition) engine to produce a text transcript of what you said.
Simultaneously, it analyzes the raw audio as a spectrogram (a visual map of frequencies over time) to extract audible features — most importantly, the signal-to-noise ratio (SNR), which measures how loud your voice is relative to background noise.
The gating subnetwork reads those audible features and selects a relevant subset of expert models. Each expert was trained on audio data from a specific SNR range — some on clean audio, some on moderately noisy, some on very noisy recordings.
The selected experts each process a tokenized version of your transcript and produce outputs. The gating network also assigns a weight to each expert's output, and those weighted outputs are combined into a final MoE layer response.

The key insight is that the text transcript alone doesn't tell the system how reliable that transcript is. The spectrogram fills in what the words can't — it's the system knowing to be more cautious and contextually flexible when conditions are rough.

What this means for Google Assistant in real-world environments

Voice assistants are already pretty good in quiet rooms. Where they fall apart is everywhere else — busy kitchens, cars, open offices, concerts. Google's MoE approach treats noise not as an obstacle to overcome in transcription, but as a signal to route intelligently at the response-generation stage. That's a meaningful architectural shift: even if the transcript is imperfect, the right expert model can compensate with better priors for what noisy-environment queries tend to mean.

For you as a user, the promise is a voice assistant that stays useful in the real world rather than only performing well in demos. For Google, it's a way to squeeze better response quality out of existing ASR output rather than needing a perfect transcript first — which matters a lot as Assistant competes with increasingly capable on-device and cloud AI rivals.

Editorial take

This is genuinely clever engineering rather than marketing. Routing assistant responses through noise-aware experts is the kind of systems-level thinking that can meaningfully improve real-world reliability without requiring a complete model retraining. Whether Google ships this in Assistant, Gemini Live, or something else, the underlying idea — treat audio quality as a routing signal, not just a transcription problem — is worth paying attention to.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice. Patentlyze may earn a commission if you click an affiliate link and make a purchase. This doesn't affect what we cover or how we cover it.