Sony Patents a Zone-Based Codec That Ties Audio Quality to Where the Action Is
Sony Interactive Entertainment has patented a codec that doesn't just compress video differently depending on where you're looking — it also adjusts audio quality based on where a sound's source appears on screen.
What Sony's zone-based video and audio encoding actually does
Imagine you're streaming a game remotely and the most important action is happening in the center of the screen — a boss fight, say, or a cutscene. Sony's new codec would encode that central area in high quality while compressing the edges more aggressively. That's not new. What is interesting is that the same logic applies to sound.
If a character making noise is standing in the high-priority zone, their audio gets encoded at higher quality too. A background ambient sound coming from something depicted in a lower-priority area of the image? That gets compressed more. The whole stream — video and audio together — is treated as a unified budget to spend wisely.
This is a continuation of an earlier Sony patent from late 2022, which means the underlying idea has been in development for a while. It's aimed squarely at remote play scenarios where bandwidth is limited and every bit counts.
How Sony's encoder links sound quality to screen zones
The patent describes an encoding pipeline that splits an image into at least two distinct zones — think of them like concentric regions of importance. The first zone (higher priority) gets encoded at a higher bitrate or fidelity; the second zone gets compressed more aggressively.
The clever extension here is the audio coupling. Each sound source in the scene is mapped to whichever zone its visual representation occupies in the image. The audio for that sound is then encoded at a quality level that matches the zone. If a character or object producing a sound appears in the high-quality zone, their audio is treated as high-priority. If the source is in a background or peripheral zone, the audio compression is heavier.
On the decoding side, the receiving device reconstructs each portion of the image using the decoding scheme that corresponds to how that portion was originally encoded. The patent specifically calls out transmission to a remote viewing device, which points to streaming or cloud gaming contexts where adaptive compression is critical.
- Zone identification in the source image
- Per-zone video encoding at different quality levels
- Audio quality linked to the zone of each sound source's visual depiction
- Transmission and zone-aware decoding on the remote device
What this means for PlayStation game streaming quality
For PlayStation Remote Play or any cloud gaming scenario, bandwidth is the enemy of quality. Most adaptive codecs treat video and audio as separate streams with separate budgets. Sony's approach here is more holistic — if something isn't visually important, it's probably not aurally important either, so compress both together. That's a reasonable heuristic for games where the camera generally points at what matters.
The practical upside for you as a player: in theory, the things you actually care about — the center of the action — look and sound better, even on a constrained connection. The tradeoff is that peripheral detail, visual and auditory, takes the compression hit instead.
This is a solid, well-scoped engineering idea rather than a moonshot. Linking audio quality to visual zone priority is a genuinely useful insight that most streaming codec work ignores entirely. It won't make headlines, but if Sony ships this in a future Remote Play or PlayStation Now update, players on weak connections will probably notice the difference without ever knowing why.
Get one Big Tech patent every Sunday
Plain English, intelligent commentary, no hype. Free.
Editorial commentary on a publicly published patent application. Not legal advice.