IBM's New Patent Replaces CAPTCHAs With a Live Video Test to Spot Bots
Forget ticking boxes of traffic lights — IBM's latest patent describes a live video call where an AI watches you complete physical tasks in real time, then scores whether you're a human or a bot.
What IBM's live video identity check actually does
Imagine you're trying to open a bank account online, and instead of clicking a blurry CAPTCHA, you're dropped into a short video call and asked to do a few quick things — maybe hold up your ID, tilt your head, or follow an on-screen prompt. An AI watches the whole thing and quietly decides whether you're a real person or an automated script.
That's the core idea in this IBM patent. You upload an ID document first, and the system pulls out details it'll use to verify you. Then, during the video call, it asks you to complete a series of tasks before a countdown runs out. Each task gets a score, and those scores are combined into one final result. If you pass, your identity is confirmed. If you fall short, an administrator gets a flag.
The system is designed to catch the kind of sophisticated bots that can now fool text-based CAPTCHAs or even static image checks. By adding real-time video and physical tasks with a time limit, it raises the bar significantly for any automated program trying to fake being human.
How the multimodal LLM scores each verification task
The patent describes a multi-step identity verification pipeline built around a multimodal large language model (an AI that can process both video and text at the same time, not just words).
Here's how the process flows:
- You submit an ID document, and the system assigns it an identification factor — essentially a confidence score for the document itself.
- The system then picks a set of task factors: live challenges it will ask you to perform during a video call.
- On the call, you're given a time limit and asked to complete each task. The AI grades each one as you go, producing a confidence score per task.
- If any task score falls below a minimum threshold, the system can loop back and ask you to retry until time runs out.
- All scores roll up into a final score, which is compared against a pass/fail threshold. That threshold can be set by a human administrator or automatically calculated by the AI itself.
The use of a multimodal AI is the key technical detail here. Most CAPTCHA systems compare static images or text responses. This system watches live video, meaning it can evaluate movement, timing, and physical responses — things that are much harder for a bot to fake convincingly under a tight clock.
What this means for online fraud and CAPTCHA's future
Standard CAPTCHAs are increasingly easy for AI scripts to solve — the same technology that powers fraud is also getting better at beating the checks designed to stop it. IBM's approach essentially fights AI with AI: using a video-watching model to catch automated accounts that a text-based test would wave through.
For banks, healthcare platforms, and government services — anywhere that identity verification is legally required — this kind of layered, real-time check could make account takeovers and synthetic identity fraud significantly harder. The flip side is friction for real users. A video call adds time and requires a camera, which raises real accessibility questions IBM's patent doesn't fully address.
This is a genuinely interesting response to a real problem: AI bots have gotten good enough to beat most existing identity checks, and the industry needs something harder to fake. The video-plus-timed-tasks approach is clever. The open question is whether it trades one problem (bots slipping through) for another (legitimate users, especially older or disabled ones, getting locked out). IBM will need to answer that before this becomes a product anyone should actually deploy.
Get one Big Tech patent every Sunday
Plain English, intelligent commentary, no hype. Free.
Editorial commentary on a publicly published patent application. Not legal advice.