Nvidia · Filed Nov 22, 2024 · Published May 28, 2026 · verified — real USPTO data

Nvidia Patents a Method for Mining Richer Embeddings from Decoder Hidden Layers

By Patentlyze Team · Updated May 29, 2026

Most AI systems treat the encoder's output as the only useful representation of data — Nvidia's new patent says the decoder's internal hidden states are actually a goldmine worth tapping.

FIG. 1A — rendered from the official USPTO publication PDF.

Publication number US 2026/0148056 A1

Applicant NVIDIA Corporation

Filing date Nov 22, 2024

Publication date May 28, 2026

Inventors Micha LIVNE, Michelle Lynn GILL

CPC classification 706/25

Grant likelihood Medium

Examiner CENTRAL, DOCKET (Art Unit OPAP)

Status Docketed New Case - Ready for Examination (Dec 20, 2024)

Document 20 claims

AI/ML

How Nvidia pulls more signal from AI decoder internals

Imagine a translator who not only converts your sentence into another language but also scribbles useful notes in the margins along the way. Most AI systems throw those margin notes away. This patent is about keeping them.

Typically, when an AI model processes data — text, images, audio — it compresses it into a compact summary called an embedding. That embedding gets used to drive decisions downstream. Nvidia's approach says: don't just use what the encoder produces up front. Dig into the decoder's own internal working notes — its hidden layers — to build a richer, more informative embedding.

The result is an embedding that captures more nuance, and that richer representation can then be used to drive a specific task output — classification, retrieval, generation, you name it. It's a way of squeezing more useful signal out of a model you've already trained, without retraining from scratch.

How the decoder's hidden layers generate the final embedding

The patent describes a four-step pipeline built on top of a standard encoder-decoder architecture (think models like T5, BART, or similar sequence-to-sequence transformers).

Encoder pass: The encoder processes the input data sample and compresses it into a latent representation — a dense numerical summary that captures the data's meaning.
Decoder hidden layer extraction: Instead of just reading the decoder's final output token-by-token, the method intercepts the hidden outputs from one or more internal decoder layers — the intermediate activations that the decoder generates while converting the latent representation into a result.
Embedding construction: Those hidden outputs (or a selected portion of them) are aggregated into a unified embedding of the original data sample.
Task-based output: That embedding is then used to produce a downstream task result — a classification label, a similarity score, a retrieval ranking, etc.

The key insight is that decoder hidden states encode context that the encoder alone doesn't capture. By the time the decoder is mid-generation, it has integrated both the input context and the generative objective, making its hidden states richer for representation tasks than raw encoder outputs.

What this means for AI models doing multiple tasks at once

Encoder-decoder models are workhorses across NLP, code generation, and multimodal AI — but they're typically used either for generation or for embedding, rarely both at the same time efficiently. This patent describes a way to extract high-quality embeddings from the decoder path, meaning you could run a single model and get both a generated output and a rich semantic embedding in one forward pass.

For Nvidia, whose hardware runs the vast majority of large-scale AI inference workloads, reducing redundant model calls translates directly to fewer GPU cycles. This also has implications for retrieval-augmented generation (RAG) systems and multi-task pipelines where both generation quality and semantic search accuracy matter.

Editorial take

This is a solid, pragmatic research patent — not a flashy new architecture, but a smart reuse strategy for models you've already trained. The real value is in inference efficiency: if you can extract a useful embedding from the decoder's hidden layers without a separate embedding model pass, that's meaningful compute savings at scale. Worth watching if you're building multi-task or RAG pipelines on Nvidia hardware.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.

Nvidia Patents a Method for Mining Richer Embeddings from Decoder Hidden Layers

How Nvidia pulls more signal from AI decoder internals

How the decoder's hidden layers generate the final embedding

What this means for AI models doing multiple tasks at once

More from Nvidia

More in AI/ML

Get one Big Tech patent every Sunday