Nvidia · Filed Sep 25, 2025 · Published May 28, 2026 · verified — real USPTO data

Nvidia Patents a Foundation Model for Generating Full-Body 3D Digital Humans

Nvidia is working on a single AI model that can generate a complete, photorealistic 3D human — face, hands, full body — trained almost entirely on ordinary 2D photos scraped from the internet, no 3D studio required.

Nvidia Patent: 3D Digital Human Foundation Model Explained — figure from US 2026/0148473 A1
FIG. 1A — rendered from the official USPTO publication PDF.
Publication number US 2026/0148473 A1
Applicant NVIDIA Corporation
Filing date Sep 25, 2025
Publication date May 28, 2026
Inventors Koki Nagano, Jingxiang Sun, Shalini De Mello, Umar Iqbal, Ye Yuan, Tianye Li, Jan Kautz, David Luebke, Simon Yuen, Xueting Li, Omer Shapira
CPC classification 345/419
Grant likelihood Medium
Examiner CENTRAL, DOCKET (Art Unit OPAP)
Status Docketed New Case - Ready for Examination (Oct 17, 2025)
Parent application Claims priority from a provisional application 63726101 (filed 2024-11-27)
Document 20 claims

What Nvidia's 3D digital human generator actually does

Imagine trying to build a lifelike digital double of a person using only regular photos — the kind anyone might post online. No depth cameras, no motion-capture suits, no 3D scanning booth. That's the core challenge Nvidia is tackling here.

This patent describes a foundational AI model — think of it like a general-purpose engine — that learns what humans look like across countless poses and appearances from flat 2D images, then generates a coherent, full-body 3D representation complete with detailed face and hands. Most existing systems either nail the face or the body, rarely both at once.

The system uses a type of AI training called a generative adversarial network, where one part of the model generates synthetic humans and another part critiques them until the results look convincingly real. The payoff: a reusable foundation you could fine-tune for games, virtual production, video calls, or digital health applications without starting from scratch.

How Gaussian maps and a GAN build a full-body human

The system centers on a GAN (Generative Adversarial Network) generator — a neural network that produces synthetic humans while a separate discriminator network scores how realistic they look and pushes the generator to improve.

The clever part is how 3D geometry is represented. Instead of building a heavy polygon mesh, Nvidia's approach uses texel-aligned Gaussian maps — think of these as a grid of tiny, fuzzy blobs whose color, opacity, and shape properties are stored in a texture map that's pinned to a coarse body mesh template. This 3D Gaussian Splatting technique (a way of representing 3D scenes as clouds of overlapping ellipsoids rather than triangles) renders fast and can represent soft, organic surfaces like skin and hair more naturally than polygons alone.

Training inputs are pose samples drawn from a dataset of 2D images — the model is never handed 3D ground-truth; it infers 3D structure from 2D supervision alone. The rendering pipeline includes:

  • A Linear Blend Skinning / Deformation Block that warps the Gaussian map to match a given pose
  • A Rendering Block that composites the Gaussians into a 2D image
  • Multi-part discriminators that evaluate the full body, face, and hands separately

Loss functions are computed from both the rendered image quality (discriminator feedback) and the Gaussian map itself, ensuring internal 3D consistency even though the training signal is mostly 2D.

What this means for real-time avatars and digital humans

For game developers, VFX studios, and virtual production teams, a foundation model like this could dramatically cut the cost and time of creating believable digital humans. Right now, a photorealistic digital double can take weeks and a dedicated capture stage. A pre-trained foundation model you can fine-tune on a handful of reference images changes that math significantly.

There's also a deeper strategic signal here: Nvidia is positioning AI-generated digital humans as a platform capability, not just a research curiosity. Paired with its Omniverse and Avatar Cloud Engine (ACE) infrastructure, a robust DHFM could become a core API that powers real-time avatars in everything from customer service bots to virtual collaborators in enterprise software — places where you might interact with a convincingly human AI face without knowing it was generated rather than captured.

Editorial take

This is a genuinely interesting research filing, not routine infrastructure work. The combination of 3D Gaussian Splatting with a full-body GAN trained purely on wild 2D images is a hard technical problem, and Nvidia's team — which includes several top names in neural rendering — is one of the few groups with the compute and expertise to pull it off at quality. Whether or not this specific patent grants broadly, it signals Nvidia is serious about owning the digital human stack end-to-end.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.