Salesforce · Filed Dec 4, 2024 · Published Jun 4, 2026 · verified — real USPTO data

Salesforce Patents a Cache That Feeds Old AI Answers Back Into New Prompts

Salesforce has figured out a way to get twice the mileage out of every answer an AI gives — either serving it directly from cache or recycling it as a teaching example for the next similar question.

Salesforce Patent: Semantic Caching for LLM Few-Shot Prompts — figure from US 2026/0154507 A1
FIG. 1A — rendered from the official USPTO publication PDF.
Publication number US 2026/0154507 A1
Applicant Salesforce, Inc.
Filing date Dec 4, 2024
Publication date Jun 4, 2026
Inventors Avi Joel BRENNER, Ka Man Mary WONG, Vincent TANG
CPC classification 704/9
Grant likelihood Medium
Examiner SHIN, SEONG-AH A (Art Unit 2659)
Status Docketed New Case - Ready for Examination (Jan 6, 2025)
Document 20 claims

What Salesforce's semantic answer cache actually does

Imagine your company's AI assistant gets asked the same sales question dozens of times a day. Right now, most systems just forward every single one to the AI model — which is slow and expensive. Salesforce's patent describes a smarter shortcut.

When a new question comes in, the system checks a "semantic cache" — basically a memory bank of past questions and their verified answers. If the new question is close enough to a stored one, it just returns the old answer immediately, no AI needed. That's the fast path.

Here's the clever part: if the question is somewhat similar but not close enough to reuse directly, the system doesn't give up on the cache — it pulls the old answer out and hands it to the AI as a worked example. The AI sees "here's a similar question and what a good answer looked like," then crafts a fresh response. You get speed and consistency without starting from zero every time.

How the similarity score decides which path a query takes

The system works as a two-stage routing layer that sits in front of a large language model.

First, every incoming user query is encoded into a vector — a list of numbers that represents the query's meaning in high-dimensional space (think of it as a fingerprint for the question's intent, not just its words). That vector is compared against a semantic cache: a stored library of vectors from previous queries, each paired with a verified response.

For each stored vector, the system calculates a semantic similarity score (a measure of how close two meaning-fingerprints are — cosine similarity is the standard approach). Then the routing decision happens:

  • High similarity (above threshold): The cached response is returned directly. The LLM is never called.
  • Moderate similarity (below threshold but still relevant): The cached response is retrieved and injected into a few-shot prompt — a prompt format where you show the model example question-answer pairs before asking your real question. The LLM uses the cached answer as a reference example and generates a tailored response.

The phrase "few-shot prompting" refers to a well-established technique where giving the model 1–5 examples dramatically improves output quality and consistency, especially in domain-specific contexts like CRM or customer support. Salesforce is essentially automating the curation of those examples from real, previously-verified traffic.

What this means for enterprise LLM costs and consistency

For enterprise software — Salesforce's core market — LLM API costs and response latency are real operational concerns at scale. A semantic cache that bypasses the model entirely for near-duplicate queries can cut inference costs meaningfully. The few-shot fallback path is the more interesting angle: instead of degrading gracefully to a cold LLM call, the system actively improves the LLM's output by seeding it with domain-relevant context from real past usage. Over time, the cache becomes a self-reinforcing quality layer.

For you as an end user of a Salesforce product, this could mean faster, more consistent AI responses that stay on-brand for your company's specific terminology and use cases — without someone manually curating prompt templates.

Editorial take

This is pragmatic infrastructure work, not a headline AI research breakthrough — but it's the kind of thing that actually ships and saves money. The two-threshold routing architecture is genuinely clever: most semantic caching systems only do the direct-hit case, and Salesforce's contribution is the "near miss becomes a training example" path. Worth tracking for anyone building enterprise AI tooling.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.