Sony · Filed Jun 9, 2025 · Published Jun 18, 2026 · verified — real USPTO data

Sony Patents an AI That Builds Music by Filling In Blanks, Over and Over

By Patentlyze Team · Updated Jun 19, 2026

Sony is patenting an AI audio engine that works like a digital mad-lib — it deliberately blanks out parts of a sound file, fills them in, then repeats the process until finished music emerges. The result, the company claims, is audio generation fast enough to happen in real time.

FIG. 1A — rendered from the official USPTO publication PDF.

Publication number US 2026/0171067 A1

Applicant SONY GROUP CORPORATION

Filing date Jun 9, 2025

Publication date Jun 18, 2026

Inventors ZHI ZHONG, AKIRA TAKAHASHI, MARCO COMUNITA, SHIQI YANG, MENGJIE ZHAO, KOICHI SAITO, YUKARA IKEMIYA, TAKASHI SHIBUYA, SHUSUKE TAKAHASHI, YUKI MITSUFUJI

CPC classification 700/94

Grant likelihood Medium

Examiner CENTRAL, DOCKET (Art Unit 2691)

Status Docketed New Case - Ready for Examination (Jul 8, 2025)

Parent application Claims priority from a provisional application 63733628 (filed 2024-12-13)

Document 19 claims

AI/ML

How Sony's iterative audio-patching approach generates music

Imagine writing a story by first placing placeholder blanks where words should go, then filling each one in — and doing that over and over until the whole thing reads naturally. Sony's patent applies that same idea to music and audio.

Here's the basic idea: the system starts with audio that has intentional 'gaps' — chunks of sound that are missing or masked. An AI model fills those gaps in, producing a repaired version. Then the system looks at that repaired audio, decides which parts still need work, masks them again, and fills them in once more. That cycle repeats a fixed number of times until the final audio is ready.

The goal is to make AI-generated audio fast enough that it can happen as you listen — or close to it. That matters for any tool where waiting around for a clip to render would break the creative flow, whether that's a game, a music app, or a production studio.

How the mask-and-repair loop assembles final audio output

The system Sony describes is built around a technique called iterative masked synthesis — a process borrowed from image-generation AI but applied here to audio spectrograms (visual maps of sound frequencies over time).

At each pass through the loop, the CPU does three things:

Repairs masked audio data — the AI model fills in the blanked-out sections of a sound representation, making a best guess at what should be there.
Extracts new mask positions — after the repair, the system identifies which parts of the audio are still uncertain or low-quality and marks those as the next targets.
Repeats — this repair-and-remask cycle runs a set number of times, progressively refining the audio with each pass.

The patent emphasizes that this runs on a CPU rather than requiring specialized AI chips, which is notable because most generative audio models lean heavily on GPUs. Running on a CPU lowers the hardware barrier significantly. The iterative design also means the system can trade quality for speed by simply running fewer loops — useful for real-time applications where a slightly imperfect result delivered instantly beats a perfect one delivered too late.

What this means for real-time AI music tools

Real-time AI audio generation is one of the harder problems in creative tech. Most current tools require you to wait — sometimes seconds, sometimes longer — for a generated clip to render. Sony's looping approach is designed to cut that wait down to something imperceptible, which could matter a lot for interactive applications like games, live performances, or adaptive soundtracks that change with what's happening on screen.

Sony's music and audio division is one of the largest in the world, and the company has been quietly building AI music tools under its research arm. A real-time generation engine that runs on standard CPU hardware — rather than expensive cloud GPUs — would make that technology far more accessible to developers and creators who don't have specialized infrastructure. For you as a user, that could eventually show up as a 'generate background music' button that actually works instantly inside a consumer app.

Editorial take

This is a genuinely interesting technical approach — using masked generative modeling for audio is well-established in research circles, but Sony's specific claim around CPU-based real-time performance is the part worth watching. If it delivers on that promise, it closes a meaningful gap between what AI audio can do in a lab and what works in a shipped product.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.

Sony Patents an AI That Builds Music by Filling In Blanks, Over and Over

How Sony's iterative audio-patching approach generates music

How the mask-and-repair loop assembles final audio output

What this means for real-time AI music tools

More from Sony

More in AI/ML

Get one Big Tech patent every Sunday