Adobe · Filed Nov 25, 2024 · Published May 28, 2026 · verified — real USPTO data

Adobe Patents ML-Driven Document Scanning That Captures Pages as You Flip

Adobe is working on a document scanner that watches you flip through a book on video and automatically grabs a clean image of each page — no button-tapping required.

Adobe Patent: Real-Time Automated Document Scanning — figure from US 2026/0148552 A1
FIG. 1A — rendered from the official USPTO publication PDF.
Publication number US 2026/0148552 A1
Applicant Adobe Inc.
Filing date Nov 25, 2024
Publication date May 28, 2026
Inventors Curtis WIGINGTON, Swapnil BHOITE, Anshul MALIK
CPC classification 382/103
Grant likelihood Medium
Examiner CENTRAL, DOCKET (Art Unit OPAP)
Status Docketed New Case - Ready for Examination (Jan 3, 2025)
Document 20 claims

What Adobe's auto-capture document scanner actually does

Imagine you need to scan a 200-page report or an old photo album. Today, that means either using a flatbed scanner one page at a time or carefully tapping a button on your phone for every single page. It's tedious, and it's easy to miss a page or catch a blurry frame.

Adobe's patent describes a system that removes all of that friction. You just film yourself flipping through the document — your phone's camera rolling in the background — and two separate AI models handle everything else. One model watches the video and detects when you've turned a page. The other model looks at the frames around that turn and figures out which one shows the page flat, still, and fully in view.

The result: a complete scan of your entire document, captured frame-by-frame from a single continuous video, with no manual intervention. Bulk document capture becomes as simple as flipping pages at a normal reading pace.

How two ML models split the page-detection workload

The system takes a continuous video stream as input — presumably from a phone camera pointed at an open book or stack of documents. Two machine learning models operate in sequence to handle the hard parts of document capture.

Model one: page-turn detection. This model watches the incoming frames and identifies when a page-turn event has occurred. Page turns are visually noisy — there's motion blur, partial occlusion, and the page curves mid-flip. Detecting the event (rather than just looking for a static page) lets the system know when to start looking for a capturable frame.

Model two: frame quality and readiness assessment. Once a page turn is detected, the second model evaluates frames to find the one that's actually ready to capture — meaning the page is flat, well-lit, fully visible, and free of motion blur. This is the frame that gets saved as the document image.

The two-model split is architecturally sensible: event detection and quality assessment are different tasks with different training signals, so separating them likely produces better results than trying to do both with one model. The patent describes this as supporting bulk capture of multiple document pages from a single input video, which implies the loop runs continuously across an entire session.

What this means for bulk book and document digitization

For anyone who regularly digitizes physical documents — researchers, archivists, students, small business owners — the friction of manual scanning is a real bottleneck. A system that turns a casual phone video into a clean multi-page PDF could make Adobe Scan or a similar mobile app dramatically more useful for bulk workflows.

The broader implication is that this kind of event-driven capture pattern — where ML detects a meaningful moment in a video stream and triggers a high-quality extraction — could extend well beyond document scanning. Adobe is clearly investing in making its document tools more ambient and hands-free, which fits neatly into a world where people increasingly reach for their phones instead of dedicated scanners.

Editorial take

This is a practical, well-scoped patent that solves a real user problem without overselling the technology. The two-model architecture is the interesting bit — splitting event detection from quality assessment is a clean design choice that suggests the team thought carefully about the failure modes of single-model approaches. It's not a flashy AI story, but it's exactly the kind of unglamorous ML work that makes consumer software actually better.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.