Nvidia · Filed Apr 14, 2025 · Published May 28, 2026 · verified — real USPTO data

Nvidia Patents a Scaling System to Fix Float16 Overflow in Neural Network Training

By Patentlyze Team · Updated May 29, 2026

Training a neural network in reduced precision is a constant balancing act — go too low and your numbers either explode or vanish. Nvidia's new patent describes a systematic way to catch and correct that before it happens.

FIG. 1A — rendered from the official USPTO publication PDF.

Publication number US 2026/0148070 A1

Applicant NVIDIA Corporation

Filing date Apr 14, 2025

Publication date May 28, 2026

Inventors Boris GINSBURG, Sergei NIKOLAEV, Ahmad KISWANI, Hao WU, Amir GHOLAMINEJAD, Slawomir KIERAT, Michael HOUSTON, Alex FIT-FLOREA

CPC classification 706/16

Grant likelihood Medium

Examiner CENTRAL, DOCKET (Art Unit OPAP)

Status Docketed New Case - Ready for Examination (Feb 18, 2026)

Parent application is a Continuation of 15624577 (filed 2017-06-15)

Document 20 claims

AI/ML

How Nvidia stops float16 from blowing up AI training

Imagine you're trying to fit a massive spreadsheet onto a notepad that only holds numbers between 0.0001 and 65,000. Numbers outside that range either get rounded to zero (underflow) or max out at the ceiling (overflow) — and either way, your calculations go wrong. That's the daily reality when training AI models using float16, a compact number format that's fast and memory-efficient but has a narrow range.

Nvidia's patent describes a fix: before doing any matrix math — the core operation in neural network training — the system checks whether the numbers involved would cause an overflow or underflow. If they would, it applies a scaling factor to bring everything into a safe range first, then does the math, then factors the scale back out.

The clever part is how it stores that information. Instead of just saving the raw numbers, each matrix is stored as a pair: a single floating-point scale value plus the scaled-down data in float16. Every element's true value is just scale × stored value. It's a lightweight bookkeeping trick that keeps precision intact without switching to a heavier number format.

Inside Nvidia's per-matrix scale-factor tuple format

The patent centers on a custom data representation for tensors (the multi-dimensional arrays that hold weights, activations, and gradients in a neural network). Instead of storing raw float16 values directly, each matrix is encoded as a tuple (a, v[]) — where a is a full-precision floating-point scale factor and v[] is an array of float16 values. The true value of any element is simply a × v[i].

Before a matrix operation (like the matrix multiplications that dominate transformer and CNN training), the system:

Inspects the value ranges of both input matrices
Determines whether the planned operation would produce overflow (a number too large for float16) or underflow (a number too small, rounding to zero)
Computes per-matrix scale factors that bring the values into float16's safe range
Performs the matrix operation on the rescaled data
Uses the scale factors to correctly interpret the output before feeding it to the next layer or optimizer step

The key insight is that separate scale factors are assigned to each matrix rather than a single global scale for the whole model. This gives finer-grained control and avoids the cascading precision loss that can happen when one outlier tensor forces a bad global rescaling. The neural network parameters are then updated using the corrected output, keeping training numerically stable without needing to fall back to float32.

What this means for low-precision AI training hardware

Float16 (and its cousin bfloat16) training has become the default for large model training because it roughly doubles throughput and halves memory bandwidth versus float32 — but numerical instability has always been its Achilles heel. Techniques like loss scaling already exist, but they operate at a coarse, model-wide level. A per-matrix scaling scheme like this could let engineers push more of the training pipeline into low-precision without babysitting gradient explosions.

For Nvidia, this also has an obvious hardware angle. Their Tensor Core units are purpose-built for exactly this kind of scaled matrix math, and a standardized tuple format could make it easier to pipeline scale-factor computation alongside the matrix operations themselves — potentially on-chip, with minimal overhead. If this approach gets baked into future CUDA libraries or the Transformer Engine, you might benefit from it automatically the next time you fine-tune a model on an H-series GPU.

Editorial take

This is genuinely useful, unsexy infrastructure work — the kind of numerical stability plumbing that makes the difference between a training run that converges and one that silently diverges at 3am. It's not a flashy AI capability, but per-matrix dynamic scaling is a real improvement over coarse loss scaling, and the tuple format is an elegant representation choice. Worth watching for integration into CUDA or cuDNN.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.

Nvidia Patents a Scaling System to Fix Float16 Overflow in Neural Network Training

How Nvidia stops float16 from blowing up AI training

Inside Nvidia's per-matrix scale-factor tuple format

What this means for low-precision AI training hardware

More from Nvidia

More in AI/ML

Get one Big Tech patent every Sunday