Nvidia · Filed Sep 22, 2025 · Published May 14, 2026 · verified — real USPTO data

Nvidia Patents a Text-to-3D Pipeline for Simulation-Ready Virtual Characters

By Patentlyze Team · Updated May 16, 2026

Nvidia is patenting a pipeline that takes a plain-English description — say, 'a hooded medieval tunic' — and spits out a fully simulation-ready 3D object geometry, no manual modeling required. The clever part is that it's built specifically for physics simulation, not just visual rendering.

FIG. 1A — rendered from the official USPTO publication PDF.

Publication number US 2026/0134627 A1

Applicant NVIDIA CORPORATION

Filing date Sep 22, 2025

Publication date May 14, 2026

Inventors Xueting LI, Umar IQBAL, Ye YUAN, Jan KAUTZ, Shalini DE MELLO, Miles MACKLIN, Jonathan Christian LEAF, Gilles DAVIET

CPC classification 345/419

Grant likelihood Medium

Examiner CENTRAL, DOCKET (Art Unit OPAP)

Status Docketed New Case - Ready for Examination (Oct 14, 2025)

Parent application Claims priority from a provisional application 63720102 (filed 2024-11-13)

Document 20 claims

AI/ML

How Nvidia turns a text description into a 3D character

Imagine you're building a virtual world and you need a character wearing a specific outfit. Normally, you'd either hire a 3D artist or spend hours in modeling software. Nvidia's patent describes a system where you just describe what you want in plain English and the software generates the 3D geometry automatically.

The system is designed so the resulting objects aren't just pretty to look at — they're ready to drop straight into a physics simulation. That means the virtual clothing can drape, collide, and move like real fabric would, without extra work from a developer or artist.

This is particularly interesting for fields like robotics training, game development, and digital twins, where you need lots of varied virtual objects fast — and you need them to behave realistically in a simulated environment, not just sit there looking nice.

How the diffusion model converts language into object geometry

The core pipeline works in three stages, each handled by a different trained neural network component.

Language embedding: Your natural language description (e.g., "a loose linen shirt with rolled-up sleeves") is first converted into a language embedding — essentially a dense numerical representation that captures the semantic meaning of the text.
Diffusion model to geometry embedding: That language embedding is fed into a trained diffusion model (the same class of AI behind image generators like Stable Diffusion, but here operating in 3D geometry space rather than pixel space) to produce a geometry embedding — a compact representation of the shape of the object.
Decoder to surface representation: A trained decoder takes that geometry embedding and outputs an object surface representation — a mathematical description of the object's surface, likely in a format like a mesh or implicit surface.

Finally, the surface representation is converted into usable first object geometry — the actual 3D structure of the virtual object. The patent specifically mentions garments as a key use case, with a dedicated garment geometry branch in the training architecture, suggesting the system is tuned to handle the complex, deformable surfaces that clothing requires in physics simulations.

What this means for game dev, robotics, and virtual worlds

For anyone building simulated environments at scale — think robotics companies training manipulation policies, game studios populating open worlds, or VFX pipelines — the bottleneck has always been content creation speed. Generating physically accurate 3D assets by hand is slow and expensive. A text-driven pipeline that outputs simulation-ready geometry could dramatically cut that cost.

Nvidia's positioning here is strategic. The company already owns the dominant physics simulation platform (Omniverse) and the dominant AI training hardware. A tool that auto-generates simulation-ready content from text fits neatly into that ecosystem — and could tighten the lock-in for developers building on Nvidia's stack.

Editorial take

This is a genuinely interesting patent because it targets a specific, painful gap: the difference between 'looks good in a render' and 'actually works in a physics sim.' Most text-to-3D research stops at visual fidelity. Nvidia is explicitly building for simulation correctness, which is exactly what their robotics and autonomous vehicle customers need. Worth watching.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.

Nvidia Patents a Text-to-3D Pipeline for Simulation-Ready Virtual Characters

How Nvidia turns a text description into a 3D character

How the diffusion model converts language into object geometry

What this means for game dev, robotics, and virtual worlds

More from Nvidia

More in AI/ML

Get one Big Tech patent every Sunday