IBM · Filed Dec 24, 2024 · Published Jun 25, 2026 · verified — real USPTO data

IBM Patents an AI System That Automatically Sorts and Stores Table Data for Faster Retrieval

By Patentlyze Team · Updated Jun 26, 2026

Every time an AI has to answer a question about a spreadsheet, it has to re-read the whole thing. IBM's new patent describes a way to pre-sort that data so the AI already knows where everything lives.

FIG. 1A — rendered from the official USPTO publication PDF.

Publication number US 2026/0178491 A1

Applicant International Business Machines Corporation

Filing date Dec 24, 2024

Publication date Jun 25, 2026

Inventors Shashank Mujumdar, Shramona Chakraborty, Nitin Gupta, Prerna Agarwal

CPC classification 707/759

Grant likelihood Medium

Examiner MARI VALCARCEL, FERNANDO MARIANO (Art Unit 2159)

Status Final Rejection Mailed (Jun 25, 2026)

Document 20 claims

AI/ML

How IBM's AI groups spreadsheet data automatically

Imagine you have a massive spreadsheet full of customer orders, product names, and delivery dates. Every time you ask an AI assistant a question about it, the AI has to scan the entire thing from scratch. That's slow and expensive.

IBM's patent describes a smarter prep step. Before you ever ask a question, the AI reads through the spreadsheet, finds the recurring categories (things like product names or cities that keep showing up), and groups all the relevant rows under each one. Those organized groupings get saved in a fast-access memory store called a cache.

When a question finally comes in, the AI can go straight to the right group instead of rereading everything. Think of it like a librarian who pre-sorts books by topic so you don't have to search every shelf each visit.

How the LLM annotates, clusters, and caches data pairs

The patent describes a pipeline built around data pairs, meaning each input (a table row or cell value) is paired with a known correct output (an answer or label). This gives the system ground truth to work from.

The large language model (LLM) first annotates the input data, tagging items with the categories or entities they belong to. For example, in a table of sales records, "Chicago" and "New York" might both get tagged as city entities.

Next, the system identifies a set of common entities that appear frequently across inputs and that are directly tied to the paired outputs. This step filters out noise so only the entities that actually matter for answering questions get carried forward.

Finally, the LLM generates an item list for each entity (all rows or values associated with, say, "Chicago"), and the whole structure gets written to a program cache (a fast temporary memory store). Later queries can pull from the cache rather than re-processing the raw table.

What this means for AI tools that query large tables

The practical problem here is real: LLMs get slow and expensive when they have to process large tables repeatedly. By front-loading the organizational work and storing results in a cache, IBM's approach could cut the per-query cost for enterprise AI tools that run on structured business data like inventory systems, financial records, or customer databases.

For you as an end user, this kind of infrastructure work is invisible but felt. It's the difference between an AI assistant that answers a data question in two seconds versus one that makes you wait while it re-reads a 10,000-row file every single time.

Editorial take

This is quiet infrastructure work, not a flashy product. IBM is essentially describing a caching layer for LLM-powered data queries, which is useful but also the kind of thing multiple teams at multiple companies are probably building right now. The patent value here depends heavily on whether IBM's specific entity-annotation-then-cache pipeline is distinct enough from what's already in the literature.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.

IBM Patents an AI System That Automatically Sorts and Stores Table Data for Faster Retrieval

How IBM's AI groups spreadsheet data automatically

How the LLM annotates, clusters, and caches data pairs

What this means for AI tools that query large tables

More from IBM

More in AI/ML

Get one Big Tech patent every Sunday