IBM · Filed Dec 13, 2024 · Published Jun 18, 2026 · verified — real USPTO data

IBM's New Patent Replaces CAPTCHAs With a Live Video Test to Spot Bots

By Patentlyze Team · Updated Jun 19, 2026

Forget ticking boxes of traffic lights — IBM's latest patent describes a live video call where an AI watches you complete physical tasks in real time, then scores whether you're a human or a bot.

FIG. 1A — rendered from the official USPTO publication PDF.

Publication number US 2026/0170876 A1

Applicant International Business Machines Corporation

Filing date Dec 13, 2024

Publication date Jun 18, 2026

Inventors James Anthony Maniscalco, Thomas Jefferson Sandridge, Stephen Forster, Brandon Harris

CPC classification 382/116

Grant likelihood Medium

Examiner CENTRAL, DOCKET (Art Unit OPAP)

Status Docketed New Case - Ready for Examination (Jan 22, 2025)

Document 20 claims

Security

What IBM's live video identity check actually does

Imagine you're trying to open a bank account online, and instead of clicking a blurry CAPTCHA, you're dropped into a short video call and asked to do a few quick things — maybe hold up your ID, tilt your head, or follow an on-screen prompt. An AI watches the whole thing and quietly decides whether you're a real person or an automated script.

That's the core idea in this IBM patent. You upload an ID document first, and the system pulls out details it'll use to verify you. Then, during the video call, it asks you to complete a series of tasks before a countdown runs out. Each task gets a score, and those scores are combined into one final result. If you pass, your identity is confirmed. If you fall short, an administrator gets a flag.

The system is designed to catch the kind of sophisticated bots that can now fool text-based CAPTCHAs or even static image checks. By adding real-time video and physical tasks with a time limit, it raises the bar significantly for any automated program trying to fake being human.

How the multimodal LLM scores each verification task

The patent describes a multi-step identity verification pipeline built around a multimodal large language model (an AI that can process both video and text at the same time, not just words).

Here's how the process flows:

You submit an ID document, and the system assigns it an identification factor — essentially a confidence score for the document itself.
The system then picks a set of task factors: live challenges it will ask you to perform during a video call.
On the call, you're given a time limit and asked to complete each task. The AI grades each one as you go, producing a confidence score per task.
If any task score falls below a minimum threshold, the system can loop back and ask you to retry until time runs out.
All scores roll up into a final score, which is compared against a pass/fail threshold. That threshold can be set by a human administrator or automatically calculated by the AI itself.

The use of a multimodal AI is the key technical detail here. Most CAPTCHA systems compare static images or text responses. This system watches live video, meaning it can evaluate movement, timing, and physical responses — things that are much harder for a bot to fake convincingly under a tight clock.

What this means for online fraud and CAPTCHA's future

Standard CAPTCHAs are increasingly easy for AI scripts to solve — the same technology that powers fraud is also getting better at beating the checks designed to stop it. IBM's approach essentially fights AI with AI: using a video-watching model to catch automated accounts that a text-based test would wave through.

For banks, healthcare platforms, and government services — anywhere that identity verification is legally required — this kind of layered, real-time check could make account takeovers and synthetic identity fraud significantly harder. The flip side is friction for real users. A video call adds time and requires a camera, which raises real accessibility questions IBM's patent doesn't fully address.

Editorial take

This is a genuinely interesting response to a real problem: AI bots have gotten good enough to beat most existing identity checks, and the industry needs something harder to fake. The video-plus-timed-tasks approach is clever. The open question is whether it trades one problem (bots slipping through) for another (legitimate users, especially older or disabled ones, getting locked out). IBM will need to answer that before this becomes a product anyone should actually deploy.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.

IBM's New Patent Replaces CAPTCHAs With a Live Video Test to Spot Bots

What IBM's live video identity check actually does

How the multimodal LLM scores each verification task

What this means for online fraud and CAPTCHA's future

More from IBM

More in Security

Get one Big Tech patent every Sunday