Samsung · Filed Jan 14, 2026 · Published May 21, 2026 · verified — real USPTO data

Samsung Patents a Self-Selecting Draft Model System for Faster AI Decoding

By Patentlyze Team · Updated Jul 10, 2026

Running a large language model fast on a phone requires a clever shortcut called speculative decoding — and Samsung just filed a patent for a system that automatically picks the best shortcut to use.

Figure from the official USPTO publication.

Publication number US 2026/0141176 A1

Applicant Samsung Electronics Co., Ltd.

Filing date Jan 14, 2026

Publication date May 21, 2026

Inventors Junhyuk LEE, Seungjin YANG

CPC classification 704/9

Grant likelihood Medium

Examiner CENTRAL, DOCKET (Art Unit OPAP)

Status Docketed New Case - Ready for Examination (Feb 13, 2026)

Parent application is a Continuation of PCTKR2025008429 (filed 2025-06-18)

Document 20 claims

AI/ML

What Samsung's draft model picker actually does

Imagine your phone is trying to autocomplete a long response using a powerful AI model. Running that model word-by-word is slow, so engineers use a trick: a smaller, faster "draft" model guesses several words ahead, and the big model checks those guesses all at once. If the guesses are good, you get the result much faster.

The problem is that no single draft model is perfect for every situation. Samsung's patent describes a system that runs multiple draft models simultaneously on a prompt, then checks which one's predictions most closely match what the big "target" model would have said. The closest match wins and gets used for the actual generation task.

This means your device isn't locked into one drafting strategy — it dynamically selects the best draft model for each situation, potentially squeezing better speed out of on-device AI without sacrificing quality.

How Samsung scores draft models against the target LLM

Speculative decoding is a well-established technique for accelerating autoregressive language models (models that generate one token at a time). A lightweight "draft" model proposes a sequence of candidate tokens, and the larger "target" model verifies them in a single parallel pass — accepting correct guesses and discarding wrong ones.

Samsung's patent adds a model-selection layer on top of this. The device maintains a pool of candidate draft models (called "first models" in the claim). When a prompt arrives:

Each draft model in the pool generates its own set of candidate tokens.
Those candidate tokens are fed into the target model, which produces its own probability distribution over what it thinks should come next.
The system computes a similarity score between each draft model's probability distribution and the target model's distribution — essentially measuring how well each draft model thinks like the big model.
The draft model with the highest similarity is selected as the active draft model for that decoding session.

The similarity metric is the key mechanism here. By comparing probability distributions (the full ranked list of likely next tokens, not just the top guess), the system gets a richer signal about alignment between draft and target than simply checking whether the top token matches.

What this means for on-device AI inference speed

Speculative decoding is already used in production LLM inference, but most implementations use a fixed draft model. Samsung's approach is notable because it treats draft model selection as a dynamic, per-prompt decision — which could meaningfully improve acceptance rates when the pool includes models specialized for different domains (code, conversation, language-specific text).

For Samsung, this is clearly aimed at Galaxy AI and on-device inference on Exynos and Snapdragon-powered devices, where squeezing latency out of limited compute is critical. If the selection overhead is low enough, this could translate to noticeably faster AI responses without requiring a bigger model or more memory.

Editorial take

This is a sensible engineering patent, not a moonshot. Speculative decoding is a proven technique and Samsung is adding one specific optimization — adaptive draft model selection — on top of it. The real question is whether the overhead of running multiple draft models to pick a winner actually nets out positive on power-constrained mobile hardware. That's an empirical question the patent doesn't answer, but the underlying idea is sound.

Which company should we read for you?

We track 17 companies here. Pro is the same weekly breakdown for any company you choose, delivered privately. Type a name and we'll scope it and send you a quote.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.

Samsung Patents a Self-Selecting Draft Model System for Faster AI Decoding

What Samsung's draft model picker actually does

How Samsung scores draft models against the target LLM

What this means for on-device AI inference speed

More from Samsung

More in AI/ML

Get one Big Tech patent every Sunday