New Google Patents · Filed Jan 20, 2026 · Published Jun 4, 2026 · verified — real USPTO data

Google Patents an AI That Tells Two Voices Apart and Answers Each One Personally

When two people talk to a Google Assistant at the same time, the AI currently treats them as a crowd. This patent describes a system that untangles those voices, matches each one to a registered account, and responds with personalized context for each speaker.

Google Patent: Multi-User AI Assistant Recognition — figure from US 2026/0155150 A1
FIG. 1A — rendered from the official USPTO publication PDF.
Publication number US 2026/0155150 A1
Applicant GOOGLE LLC
Filing date Jan 20, 2026
Publication date Jun 4, 2026
Inventors Dongeek Shin, Anupam Pathak
CPC classification 704/275
Grant likelihood Medium
Examiner CENTRAL, DOCKET (Art Unit 2658)
Status Docketed New Case - Ready for Examination (Feb 24, 2026)
Parent application is a Division of 18236249 (filed 2023-08-21)
Document 20 claims

What Google's multi-speaker assistant actually does

Imagine two people in a kitchen asking a shared Google speaker different questions at the same time — one asking about their calendar, the other asking for a recipe. Right now, most voice assistants get confused and respond generically. Google's new patent describes a way to fix that.

The system listens to the audio, figures out that two different people are speaking, and then checks whether each voice matches a registered user account on the device. If your voice is recognized, the assistant pulls in your personal profile — things like your preferences, your calendar, your saved settings — and uses that context when crafting its reply.

If one speaker isn't registered, the system still handles the conversation gracefully, using whatever it knows about the registered person alongside the unrecognized speaker's words. Both utterances get factored into a single, coherent response.

How Google's system links voices to user accounts

The patent describes a pipeline that starts the moment a device captures overlapping or sequential audio from two different people. Here's how the pieces fit together:

  • Speaker diarization: The audio is split into separate transcriptions — one per speaker — so the system knows who said what.
  • Account matching: Each transcription is checked against registered user accounts on the assistant. This is voice recognition tied to identity, not just sound.
  • Attribute injection: For recognized users, the system builds a natural language description — a structured text block that includes the user's account attributes (name, preferences, linked services) plus what they said. Think of it as giving the LLM a character sheet before it reads the dialogue.
  • Graceful guest handling: If one speaker isn't registered, the system doesn't crash — it still generates a response using the registered user's context and the guest's raw transcription together.

A generative model (the LLM at the core of the assistant) then processes both natural language descriptions simultaneously and returns a response that addresses both utterances at once. The claim specifically covers the asymmetric case: one known user, one unknown.

What this means for shared Google Home devices

Shared voice assistants — Google Nest speakers, smart displays, devices in hotels or shared offices — have always struggled with the multi-user problem. Personalization breaks down the moment a second person enters the conversation. This patent directly addresses that gap, which is increasingly relevant as Google Assistant and Gemini-based assistants become hubs for household AI.

For you as a user, this could mean your Google speaker finally knows when your partner asks a question mid-conversation and doesn't accidentally respond to them using your personal calendar data. It's a meaningful privacy and usability fix wrapped in infrastructure that most people will never see — but definitely feel.

Editorial take

This is genuinely useful, unglamorous work. Multi-user voice recognition has been a known weak spot in smart speakers for years, and the specific innovation here — packaging per-user account context as a natural language description before feeding it to an LLM — is a clean solution to a real problem. The asymmetric guest-handling clause is especially practical: most households have at least one person whose voice isn't registered.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.