Google Patents an AI That Reads Your Phone Calls and Suggests What to Say Next
Google is patenting a system where an AI chatbot listens to someone speaking on a phone call and starts generating possible replies for you — before they've even finished their sentence.
What Google's mid-call suggestion chips actually do
Imagine you're on a phone call but instead of talking yourself, an AI chatbot is handling the conversation on your behalf. As the other person speaks, your screen shows a row of suggested reply buttons — tap one and the AI says it out loud to them, in synthesized speech.
The twist in this patent is the timing. The AI doesn't wait for the other person to finish their sentence before it starts thinking. It begins generating reply suggestions the moment they start talking, so the chips can appear on your screen almost instantly. If the rest of the sentence changes the meaning, the AI will swap out those early suggestions for better ones.
This is effectively an upgraded version of Google's existing Duplex-style AI calling features — the kind that can book a restaurant reservation for you. The new wrinkle is real-time, mid-sentence prediction to make the back-and-forth feel faster and less robotic.
How the system reads partial speech to pre-generate replies
The system works in two parallel tracks running against a live phone call audio stream.
Track 1 — early prediction: The moment the other caller starts speaking, the system feeds that initial audio fragment into the chatbot model. It generates one or more suggestion chips (think of them like quick-reply buttons in a messaging app) — each chip carries a pre-written response that seems plausible given the partial utterance.
Track 2 — confirmation or correction: As the rest of the sentence arrives, the system processes the subsequent audio. It then makes a go/no-go call: do the early chips still make sense for the full utterance? If yes, it renders them on the user's screen immediately. If no, it discards them and generates a fresh set based on the complete sentence.
- If the user taps a chip, the chatbot speaks the corresponding suggestion aloud to the other caller via text-to-speech synthesis.
- The entire loop — listen, predict, confirm, display, speak — is designed to happen fast enough to feel conversational rather than lagged.
- The user's client device (likely a phone or tablet) shows the chips; the remote caller's device plays back the synthesized audio response.
The key technical bet here is that the beginning of a spoken sentence is usually enough to predict the general category of reply needed — even before you know how the sentence ends.
What this means for AI-assisted phone calls
For anyone who uses — or would want to use — an AI proxy to handle routine phone calls (think: scheduling, customer service, quick info requests), this dramatically reduces the awkward pause problem. Current AI calling systems often feel slow because they process the full utterance before responding. Pre-generating suggestions from partial audio is a meaningful latency fix.
It also expands the scope of who this technology helps. People with speech or communication disabilities who rely on AI-assisted calling would benefit enormously from faster, more accurate suggestion chips. And for everyone else, it nudges AI phone assistants closer to feeling like a real, responsive conversation rather than a voice-menu maze.
This is a genuinely clever engineering approach to a real problem — AI phone assistants sound slow because they wait for complete sentences. Google is essentially betting on predictive pre-computation to shave off that lag, which is the right problem to be solving. The fact that it also handles the case where the early prediction is wrong (swapping in new chips) shows this isn't just an optimization hack but a more robust system design.
Get one Big Tech patent every Sunday
Plain English, intelligent commentary, no hype. Free.
Editorial commentary on a publicly published patent application. Not legal advice.