Sony Patents an AI System That Splits Songs Into DAW-Ready Stems
Sony has filed a patent for an AI-powered system that takes any audio track and splits it into separate instrument stems — vocals, drums, bass, and more — already formatted for use in a digital audio workstation. It's the kind of tool that could turn a finished song back into an editable project in seconds.
What Sony's AI stem-splitter actually does for musicians
Imagine you hear a song and want to remix just the vocals, or practice along with only the drum track stripped out. Today, doing that properly usually means either having access to the original studio files or spending serious time with imperfect tools.
Sony's patent describes a system where you feed a piece of audio into an AI model, tell it what you want separated, and get back individual stems — the isolated tracks for each instrument or voice — already packaged in a format your DAW (think Ableton, Logic Pro, or Pro Tools) can open and work with immediately.
The AI model has been trained in advance to recognize the sonic fingerprints of different musical elements. So when you ask it to pull out the guitar, it knows what a guitar sounds like in context and carves it out accordingly. The result lands on your timeline, ready to edit, pitch-shift, or build on.
How the model isolates and exports individual sound sources
The patent describes an information processing apparatus built around three core components:
- Acquisition unit — ingests the source audio (the "first content").
- Reception unit — accepts a user request specifying how the audio should be separated (e.g., "give me the vocals and the rhythm section as separate stems").
- Generation unit — runs the audio through a pre-trained ML model and outputs the separated sound sources in a DAW-compatible format (think WAV stems, or a session file structure a workstation can natively import).
The key technical element is the pre-trained model. It has learned the characteristic features of individual sound sources that make up music — the spectral and temporal signatures of drums, bass, melody instruments, and vocals. This is essentially source separation (the computational task of unmixing a combined audio signal into its constituent parts), handled by a neural network rather than manual EQ tricks.
The user's request gates what gets generated, meaning the system isn't just blindly splitting everything — it responds to intent. The output format being DAW-ready is the workflow-closing detail: the stems don't just exist as raw audio files, they're structured so a producer can open them and start working immediately.
What this means for music production and remixing workflows
Source separation tools already exist — Moises, Lalal.ai, and even Adobe's offerings do versions of this. What Sony's patent emphasizes is the tight integration with the DAW workflow: the output isn't just an isolated audio file you then have to import and align yourself, it's formatted for direct use in a workstation session. For producers and remixers, that last-mile friction is real.
Sony has deep stakes in both the music production hardware/software space (via its professional audio brands) and the music rights ecosystem. A tool like this, built into their ecosystem, could meaningfully change how quickly you can go from hearing an idea in an existing track to building something new on top of it.
This is a genuinely practical patent — not a moonshot. AI stem separation is a proven, useful technology, and Sony wrapping it in a DAW-native output format is a real workflow improvement over current standalone tools. The interesting question is where this surfaces: a standalone app, a plugin, or baked into Sony's professional audio hardware.
Get one Big Tech patent every Sunday
Plain English, intelligent commentary, no hype. Free.
Editorial commentary on a publicly published patent application. Not legal advice.