Samsung · Filed Feb 11, 2026 · Published Jun 25, 2026 · verified — real USPTO data

Real-Time Voice Translation Patent Preserves Each Speaker's Tone

Samsung is working on a translator that doesn't just convert words from one language to another. It also tries to carry over the way you sound when it does it.

Samsung Patent: Real-Time Multi-Language Voice Translation — figure from US 2026/0178855 A1
FIG. 1A — rendered from the official USPTO publication PDF.
Publication number US 2026/0178855 A1
Applicant Samsung Electronics Co., Ltd.
Filing date Feb 11, 2026
Publication date Jun 25, 2026
Inventors Sandeep Singh SPALL, Choice CHOUDHARY
CPC classification 704/3
Grant likelihood Medium
Examiner CENTRAL, DOCKET (Art Unit OPAP)
Status Docketed New Case - Ready for Examination (Apr 2, 2026)
Parent application is a Continuation of PCTKR2024008226 (filed 2024-06-14)
Document 15 claims

What Samsung's multi-speaker voice translator actually does

Imagine you're in a meeting with colleagues who speak three different languages, all talking at once. Today's translation tools tend to stumble: they flatten everyone into the same robotic voice, lose track of who said what, and sometimes give up when two people speak simultaneously.

Samsung's patent describes a system built to handle that chaos. It listens to everyone at once, figures out who is speaking and in which language, then translates each person's words into a shared target language. The translated audio isn't just technically accurate, it also tries to recreate your speaking style, so the output sounds like you, just in a different language.

The key piece is what Samsung calls a "conversation manager," a coordinating layer that keeps track of multiple speakers and their languages at the same time, rather than processing one speaker's full sentence before moving on. That's what makes the real-time part plausible.

How the system separates speakers, languages, and vocal tone

The system centers on a conversation manager module that orchestrates several steps at once rather than in a simple queue.

First, it takes in audio from multiple users simultaneously and runs speech-to-text conversion on each speaker's utterance independently. It then performs language recognition using both the transcribed text and the raw acoustic features of the audio (things like pitch, rhythm, and phoneme patterns) to identify which language each speaker is using, even mid-conversation.

Next, it segments the text into translation-ready chunks based partly on language boundaries. This matters because sentence structure differs so much across languages that translating word-by-word produces nonsense; segmenting intelligently produces cleaner output. A language processing model (a large translation model) then converts each segment into the target language.

The part that sets this apart from basic translation is the tone style embedding step. The system retrieves a stored vocal profile that matches the original speaker's style, then uses it when generating the final audio output. The goal is that the translated speech doesn't just carry the meaning of what you said; it also reflects how you said it.

What this means for Galaxy devices and live translation

Samsung already ships a Live Translate feature on Galaxy phones, and a version of real-time translation is baked into Galaxy AI. This patent points toward a meaningful upgrade to that capability: handling group conversations, not just two-person exchanges, and preserving individual vocal identity in the output.

For users, the practical difference is the jump from "this tool translates my words" to "this tool translates my voice." In a business call or a travel scenario, hearing a translation that still sounds like the original speaker talking to you is considerably less disorienting than a uniform synthesized voice reading everyone's lines.

Editorial take

This is a genuinely interesting engineering challenge, and Samsung's approach of combining speaker separation, multilingual detection, and tone preservation in one coordinated pipeline is worth watching. Whether the tone-style embedding actually sounds convincing in practice is the hard question the patent doesn't answer, but the problem it's solving is real and the framing is specific enough to take seriously.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.