Nvidia · Filed Feb 17, 2025 · Published May 28, 2026 · verified — real USPTO data

Nvidia Patents a Self-Curating Synthetic Data Loop for Stereo Vision Models

By Patentlyze Team · Updated May 29, 2026

Training a depth-sensing AI normally requires mountains of carefully labeled real-world images — a slow, expensive process. Nvidia's new patent describes a system that generates its own training data, then uses the model it trained to throw out the bad examples automatically.

FIG. 1A — rendered from the official USPTO publication PDF.

Publication number US 2026/0148404 A1

Applicant NVIDIA Corporation

Filing date Feb 17, 2025

Publication date May 28, 2026

Inventors Matthew Trepte, Bowen Wen, Jack Zhang, Gordon Grigor, Stanley Thomas Birchfield

CPC classification 348/43

Grant likelihood Medium

Examiner KIR, ALBERT (Art Unit 2485)

Status Non Final Action Mailed (Apr 29, 2026)

Parent application Claims priority from a provisional application 63726916 (filed 2024-12-02)

Document 21 claims

AI/ML

What Nvidia's stereo training loop actually does

Imagine teaching a robot to judge distances by showing it thousands of photo pairs — one image from the left eye, one from the right, just like your own binocular vision. The tricky part is that gathering and labeling all those real photos takes enormous time and money.

Nvidia's approach is to synthesize the training images in a virtual environment using a tool called a "replicator composer." It renders artificial scenes, trains a depth-sensing model on them, and then uses that freshly trained model to grade the next batch of synthetic images — automatically discarding the ones that are too easy, too weird, or not useful. The result is a tighter, higher-quality dataset without a human having to review every frame.

The system also deliberately mixes two flavors of synthetic data: realistic-style scenes that look close to the real world, and chaotic-style scenes that are deliberately strange — both are included to make the final model more robust.

How the replicator composer builds and filters scenes

The patent describes an iterative, bootstrapped pipeline for building stereo-vision training datasets entirely from synthetic imagery.

Here's the core loop:

A replicator composer — a procedural scene generator — produces a first batch of synthetic stereo image pairs (two slightly offset camera views that encode depth information).
A stereo model (the patent specifically mentions a "Foundational Stereo" architecture) is trained on that first dataset.
The composer then generates a second, larger batch of scenes.
The already-trained model is applied to this second batch to score each image pair and filter out low-quality or uninformative examples before they're used for the next training round.

The composer also handles scene composition intelligently: it calculates the center of mass of all objects placed in a virtual scene and orients the virtual camera toward that center — ensuring the camera isn't pointed at empty space.

Data diversity is a first-class concern. The pipeline explicitly generates multiple "realism categories" (photo-realistic vs. deliberately chaotic) and multiple "use case categories" covering navigation, driving, and robotic manipulation — so a single pipeline can feed models destined for very different downstream tasks.

What this means for robot vision and autonomous systems

Stereo depth estimation is foundational to autonomous robots, self-driving vehicles, and any AR/VR system that needs to understand the 3D structure of a scene. The bottleneck has always been training data: real stereo datasets are expensive to capture and hard to label with ground-truth depth. A pipeline that generates and self-curates its own training data at scale could dramatically lower that barrier.

For Nvidia specifically, this fits squarely into its robotics and autonomous vehicle platforms (Isaac, Drive). If the quality of synthetic training data can be automatically policed by the model itself, Nvidia can iterate faster and offer pre-trained stereo models that perform well out of the box — a meaningful competitive advantage when selling to developers building robots and autonomous systems.

Editorial take

This is genuinely useful infrastructure work, not a flashy AI demo. The self-curation loop — using a trained model to filter its own next round of training data — is the kind of practical engineering that separates companies that ship reliable perception systems from those that are still manually wrangling datasets. It's worth watching because it compounds: better data means a better model means better data filtering, and so on.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.

Nvidia Patents a Self-Curating Synthetic Data Loop for Stereo Vision Models

What Nvidia's stereo training loop actually does

How the replicator composer builds and filters scenes

What this means for robot vision and autonomous systems

More from Nvidia

More in AI/ML

Get one Big Tech patent every Sunday