AMD · Filed Dec 23, 2024 · Published Jun 25, 2026 · verified — real USPTO data

AMD Patents a Hardware Shortcut for the Number-Conversion Work AI Chips Do Constantly

By Patentlyze Team · Updated Jun 26, 2026

AI chips spend a surprising amount of time just converting numbers back and forth between formats. AMD's new patent describes a hardware circuit that turns that repetitive conversion into a simple table lookup, potentially cutting the time and power those operations consume.

FIG. 1A — rendered from the official USPTO publication PDF.

Publication number US 2026/0178070 A1

Applicant Advanced Micro Devices, Inc.

Filing date Dec 23, 2024

Publication date Jun 25, 2026

Inventors Shubra Marwaha, Bin He, Subramaniam Maiyuran

CPC classification 708/200

Grant likelihood Medium

Examiner CENTRAL, DOCKET (Art Unit OPAP)

Status Docketed New Case - Ready for Examination (Feb 11, 2025)

Document 20 claims

Hardware

What AMD's lookup table shortcut actually does for AI chips

Imagine your GPS has to convert street addresses into coordinates thousands of times a second. Instead of doing the math fresh each time, it could keep a printed cheat sheet and just look up the answer. That's essentially what this AMD patent proposes for AI chips.

AI systems work with two kinds of numbers: big, precise "floating-point" numbers and smaller, compressed "integers." Chips constantly convert between the two, a process called quantization (shrinking) and dequantization (expanding back). It's repetitive, and doing it through arithmetic takes time and energy.

AMD's approach stores a pre-built table of the full-size floating-point numbers inside a fast piece of chip memory called a cache. When the chip needs to convert a compressed integer back into its full-size version, it just looks up the answer in that table instead of calculating it. A simple selector circuit picks the right entry in a single step.

How the cache lines and multiplexers swap integers for floats

The patent describes a hardware unit built around two components working together:

A lookup table (LUT) stored in cache: A small, fast memory block holds a pre-computed list of floating-point numbers (the big, precise format AI math relies on). The table fits inside one or more "cache lines," which are the standard chunks a processor fetches from memory in a single read.
A multiplexer as the selector: A multiplexer (think of it as a hardware switch with many inputs and one output) receives a compact integer index and uses it to pick the correct floating-point value from the table. No arithmetic required; it's a direct wire selection.

The integer index is a quantized representation of the original floating-point number. Quantization is the process of compressing a high-precision number into a smaller integer format to save memory and bandwidth, common in AI inference workloads. Dequantization is going the other way: recovering the full-precision value when you need it for computation.

The patent also describes storing an entire matrix of these integer indices in memory, a pattern that maps directly onto how neural network weights and activations are stored during AI inference. The LUT hardware can then serve dequantization requests for any element in that matrix quickly and in parallel.

What this means for AI chip efficiency and AMD's GPU roadmap

Quantization has become a standard technique for running large AI models on consumer-grade hardware, including AMD's Radeon GPUs. The bottleneck isn't always the multiplication itself; it's the overhead of converting compressed model weights back into usable floating-point numbers before each calculation. Replacing that conversion arithmetic with a cache lookup could reduce latency and power draw in exactly the workloads (local AI inference, LLM serving) where AMD is competing hardest against Nvidia.

For you, this kind of low-level hardware improvement is the type of thing that eventually shows up as "faster AI" in a driver update or a new GPU generation, without any visible feature change. It's plumbing, but important plumbing.

Editorial take

This is a focused, practical hardware optimization rather than a conceptual leap, and AMD clearly filed it with AI inference workloads in mind. The LUT-plus-multiplexer approach is an old trick in chip design, but applying it specifically to quantization lookup in a cache-line-sized structure for neural network matrices is worth watching. If it ships in a future RDNA or CDNA GPU, it's the kind of quiet efficiency gain that adds up across millions of dequantization operations per second.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.

AMD Patents a Hardware Shortcut for the Number-Conversion Work AI Chips Do Constantly

What AMD's lookup table shortcut actually does for AI chips

How the cache lines and multiplexers swap integers for floats

What this means for AI chip efficiency and AMD's GPU roadmap

More from AMD

More in Hardware

Get one Big Tech patent every Sunday