Salesforce · Filed Nov 18, 2024 · Published May 21, 2026 · verified — real USPTO data

Salesforce Patents an Automated Pipeline for Tuning Large Language Models

By Patentlyze Team · Updated Jul 10, 2026

Running a large language model in production is expensive — every millisecond of inference latency and every wasted GPU cycle adds up fast. Salesforce is patenting a system that automatically searches for the best combination of optimization tricks to make a pre-existing LLM leaner, without touching the training process.

Figure from the official USPTO publication.

Publication number US 2026/0141210 A1

Applicant Salesforce, Inc.

Filing date Nov 18, 2024

Publication date May 21, 2026

Inventors Chi Wang, Jianxiang Chang, Peiheng Hu, Seetharaman Gudetee, Sandeep Bansal, Bhavesh Doshi

CPC classification 706/27

Grant likelihood Medium

Examiner CENTRAL, DOCKET (Art Unit OPAP)

Status Docketed New Case - Ready for Examination (Dec 19, 2024)

Document 20 claims

AI/ML

What Salesforce's LLM auto-optimizer actually does

Imagine you have a powerful but sluggish AI model and you need it to respond faster and cost less to run — but retraining it from scratch is off the table. That's the problem Salesforce's patent is designed to solve.

The system works in two phases. First, an offline program takes a library of optimization techniques — things like quantization, pruning, or batching strategies — and generates many different combinations of settings to try. Think of it like a recipe book of tuning options that gets handed off to the next stage.

Then a fully automated online program takes those candidate configurations and tests them against the actual model running in a real environment, ultimately picking the best setup and deploying it to production. The goal is to squeeze better performance out of models you already have, rather than building new ones.

How the offline-to-online optimization handoff works

The patent describes a two-stage optimization architecture for large language models (LLMs) — the big AI systems like those powering chatbots and code assistants — focused on inference optimization (making the model respond faster and cheaper at runtime, not during training).

The first stage is an offline optimization program built around a generic model framework — essentially a plug-in architecture that can accept different types of optimization techniques and combine them. By generating many combinations of LLM serving configurations (settings that control how the model is loaded, quantized, batched, and served), the offline stage builds a search space of candidate setups.

The second stage is a fully automated online optimization program that receives those candidate configurations and evaluates them against a live or near-live model environment. It selects the best-performing configuration and deploys the optimized model to a production environment.

Key characteristics of the claimed approach:

Works on pre-existing LLMs — no retraining required
Modular framework supports swapping in new optimization techniques
End-to-end automation from configuration generation to deployment
Separates offline search (exploratory) from online validation (confirmatory)

What this means for enterprise AI deployment costs

For any company running LLMs at scale — and Salesforce's Einstein AI platform qualifies — inference costs are a real operational burden. A system that automatically finds the optimal serving configuration for each model could meaningfully cut GPU time and latency without requiring ML engineers to manually tune every deployment. That's a labor and cost win that compounds across a large model portfolio.

The broader implication is about operationalizing AI: as enterprises deploy more models, the bottleneck shifts from training to serving. Salesforce filing in this space signals that it's building infrastructure to manage that shift at enterprise scale — and wants to own the tooling that sits between a trained model and a live customer-facing product.

Editorial take

This is solid infrastructure work, not a flashy AI capability patent. The two-stage offline/online optimization loop is a well-understood pattern in ML systems engineering, so the novelty here likely lives in the specific implementation details that didn't make it into the abstract. For Salesforce, this matters because it's the kind of boring-but-critical plumbing that determines whether Einstein AI can scale profitably — and that's worth paying attention to even if it won't make headlines.

Which company should we read for you?

We track 17 companies here. Pro is the same weekly breakdown for any company you choose, delivered privately. Type a name and we'll scope it and send you a quote.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.

Salesforce Patents an Automated Pipeline for Tuning Large Language Models

What Salesforce's LLM auto-optimizer actually does

How the offline-to-online optimization handoff works

What this means for enterprise AI deployment costs

More from Salesforce

More in AI/ML

Get one Big Tech patent every Sunday