Adobe · Filed Nov 20, 2024 · Published May 21, 2026 · verified — real USPTO data

Adobe Patents a Parallel Approach to Causal Linear Attention in Neural Networks

By Patentlyze Team · Updated Jul 10, 2026

Attention mechanisms are the engine inside modern AI — but they're also the bottleneck. Adobe's new patent describes a way to run a key part of that engine in parallel, potentially making AI inference meaningfully faster without sacrificing accuracy.

Figure from the official USPTO publication.

Publication number US 2026/0141217 A1

Applicant ADOBE INC.

Filing date Nov 20, 2024

Publication date May 21, 2026

Inventors Nikolaos Vlassis, Haoliang Wang

CPC classification 706/26

Grant likelihood Medium

Examiner CENTRAL, DOCKET (Art Unit OPAP)

Status Docketed New Case - Ready for Examination (Dec 20, 2024)

Document 20 claims

AI training & infrastructure

What Adobe's parallel attention trick actually does

Imagine you're reading a long document and need to remember everything you've read so far before understanding the next sentence. Most AI models do exactly that — they look back at all previous content every single step. It works, but it's slow, especially as sequences get longer.

Adobe's patent proposes a way to break that process into chunks — called data blocks — and process them simultaneously rather than one after another. Instead of waiting for each step to finish before starting the next, the system pre-computes summaries of earlier blocks and uses them to inform later ones, all in parallel.

The result is a causal linear attention system — one that still respects the rule that a model can only "see" what came before the current position, but does it much more efficiently. For you as a user, this could translate to faster AI tools inside products like Adobe Firefly or Photoshop's generative features.

How Adobe's block-based attention accumulation works

The patent describes a method for computing causal linear attention — a variant of the attention mechanism (the core math that lets AI models relate different parts of an input to each other) that is both causally constrained (each position only attends to previous positions, not future ones) and linear in complexity rather than quadratic.

The key innovation is parallelization across data blocks. Here's how the four-step pipeline works:

Step 1 — First intermediate block: For each block, compute an outer product of the key and value data (a mathematical summary of what that block "knows").
Step 2 — Second intermediate block: Accumulate these summaries across blocks in order — each block's summary incorporates all previous blocks' summaries. This is the causal memory.
Step 3 — Linear attention block: Combine the accumulated summary with the current block's query, key, and value data to produce the attention output.
Step 4 — Attention mechanism: Apply this output to drive the neural network's actual computation.

The trick is that Steps 1 and parts of Steps 2–3 can be computed in parallel across blocks, rather than sequentially. Traditional causal attention forces strict serial ordering; this approach restructures the math so GPUs and TPUs can do much more work simultaneously.

What this means for AI model speed and scalability

Standard softmax attention (what most transformers use) scales quadratically with sequence length — double the input, quadruple the compute. Linear attention fixes that scaling problem but has historically been tricky to implement with proper causal ordering without losing the parallelism that makes GPUs fast. Adobe's approach aims to have it both ways.

For a company like Adobe, whose generative AI tools process long image, video, and document contexts, shaving time and compute off the attention layer is genuinely valuable. If this approach holds up at scale, it could mean faster generation in Firefly, lower cloud costs for Adobe's AI infrastructure, or the ability to handle longer creative inputs — like high-resolution video sequences — without hitting memory walls.

Editorial take

This is a solid, focused engineering patent tackling a real bottleneck in transformer-based AI. It's not a new model architecture or a flashy product reveal — it's plumbing work on the attention mechanism that could quietly speed up everything Adobe builds on top of it. Worth tracking if you follow AI efficiency research, but don't expect a product announcement tied directly to this filing.

Which company should we read for you?

We track 17 companies here. Pro is the same weekly breakdown for any company you choose, delivered privately. Type a name and we'll scope it and send you a quote.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.

Adobe Patents a Parallel Approach to Causal Linear Attention in Neural Networks

What Adobe's parallel attention trick actually does

How Adobe's block-based attention accumulation works

What this means for AI model speed and scalability

More from Adobe

More in AI training & infrastructure

Get one Big Tech patent every Sunday