Intel · Filed Dec 8, 2025 · Published Jun 18, 2026 · verified — real USPTO data

Intel Patents a Streaming Buffer That Feeds AI Work Directly to the GPU

By Patentlyze Team · Updated Jun 19, 2026

Intel is patenting a way to shortcut the memory round-trip that slows down AI processing on GPUs — by parking data in a small, fast buffer right between the media engine and the compute core.

FIG. 1A — rendered from the official USPTO publication PDF.

Publication number US 2026/0170600 A1

Applicant Intel Corporation

Filing date Dec 8, 2025

Publication date Jun 18, 2026

Inventors Subramaniam Maiyuran, Durgaprasad Bilagi, Joydeep Ray, Scott Janus, Sanjeev Jahagirdar, Brent Insko, Lidong Xu, Abhishek R. Appu, James Holland, Vasanth Ranganathan, Nikos Kaburlasos, Altug Koker, Xinmin Tian, Guei-Yuan Lueh, Changliang Wang

CPC classification 345/501

Grant likelihood Low

Examiner CENTRAL, DOCKET (Art Unit OPAP)

Status Docketed New Case - Ready for Examination (Mar 11, 2026)

Parent application is a Continuation of 18490593 (filed 2023-10-19)

Document 15 claims

Hardware

What Intel's streaming buffer actually does for AI

Imagine your GPU is a chef, and every time it needs an ingredient, someone has to run to a warehouse across town to get it. That round-trip takes time, burns energy, and creates a bottleneck. Intel's patent is essentially about building a small pantry right next to the stove.

The system adds a streaming buffer — a temporary holding area — between the part of the chip that handles media (video decoding, for example) and the part that does AI calculations. Instead of the AI engine constantly pulling data from main memory and pushing results back, it reads from and writes through this tighter, faster middle layer.

The goal is to cut down on latency (how long things take), power draw (how much energy the chip burns), and bandwidth pressure (how much data has to travel across the chip's memory bus). It's the kind of plumbing improvement that doesn't make a flashy announcement but quietly makes everything run better.

How data flows from media IP through the buffer to the GPU

The patent describes a three-part architecture designed to make AI inference on a GPU more efficient:

Producer IP — typically a media engine (think: a dedicated video decoder). It pulls raw data from main memory and processes it.
Streaming buffer — a logically interposed buffer (a fast, intermediate storage layer placed in the data path) that receives the producer's output before the GPU ever sees it.
Compute core — a GPU or a specialized AI core inside the GPU. It reads from the streaming buffer, runs AI inference (the process of applying a trained neural network to new data), and writes results back to memory.

The key insight is that by inserting this buffer between the media engine and the compute core, data doesn't have to make a full round-trip through main memory between pipeline stages. That cuts the distance data travels, which directly reduces power consumption, lowers latency, and eases pressure on the memory bus.

The patent situates this improvement under a broad umbrella of GPU processing and caching optimizations — meaning the streaming buffer is one piece of a larger set of architectural tweaks Intel is pursuing to address bottlenecks in AI and media workloads on integrated and discrete graphics hardware.

What this means for AI workloads on Intel graphics

AI inference is increasingly being run on GPUs, not just in data centers but in laptops and consumer devices. Every watt saved and every millisecond cut matters — especially on battery-powered hardware where thermal headroom is limited. A smarter data path between the media engine and the compute core is the kind of low-level fix that can meaningfully improve real-time AI tasks like video analysis, image upscaling, or on-device model inference.

For Intel specifically, this fits into a broader push to make its integrated and discrete GPU lines more competitive for AI workloads. The company is competing with Nvidia and AMD for AI inference performance, and architectural plumbing like this — while unglamorous — is where a lot of those performance gaps actually live.

Editorial take

This is a fairly narrow infrastructure patent covering one specific data-routing optimization in GPU design. The claims are canceled in the published form, which limits its immediate legal weight. That said, the underlying idea — cutting memory round-trips during AI inference by staging data in a purpose-built buffer — is exactly the kind of low-level work that produces real-world performance gains. Worth filing away as a signal of Intel's GPU architecture direction, not worth getting excited about on its own.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.

Intel Patents a Streaming Buffer That Feeds AI Work Directly to the GPU

What Intel's streaming buffer actually does for AI

How data flows from media IP through the buffer to the GPU

What this means for AI workloads on Intel graphics

More from Intel

More in Hardware

Get one Big Tech patent every Sunday