Microsoft Patents an AI System That Finds Images Visually Different From Your Query
Most image search returns more of the same — Microsoft's new patent describes a system that deliberately hunts for related images that look different from the one you started with, using a generative AI model to write the search queries itself.
What Microsoft's diverse image retrieval actually does
Imagine you search for a photo of a red sneaker, and the results are just fifty more red sneakers. That's useful up to a point — but what if you wanted to see how that sneaker looks styled with different outfits, or compare it to similar shoes in different colors? That's the gap this patent is trying to close.
Microsoft's system takes a candidate image and its metadata, then asks a generative AI model to write a set of search queries designed to return images that are related but visually distinct. Think of it as hiring a creative researcher who knows not to just copy-paste the original.
The results from those AI-generated queries get filtered and then stored in a prebuilt lookup table — a diverse image relationship table — so that when a user or app asks for related images, the system can respond in real time without running the whole AI pipeline again on the spot.
How the AI builds and maps diverse image search queries
The core workflow has three stages. First, the system constructs an image query generation prompt — a set of instructions fed to a generative AI model — that tells the model to produce multiple search queries for a given candidate image. Crucially, the prompt specifies that results must be visually different from the original, not just similar. This is the key design constraint that separates it from standard reverse image search.
Second, those AI-generated queries are sent to a conventional image search system (think Bing Image Search or an internal index). The search system returns result sets for each query, and the patent describes selecting a subset of those results — presumably via relevance scoring or deduplication logic — to keep the final set both useful and varied.
Third, the selected images and the original candidate image are written into a diverse image relationship table: a precomputed mapping of one image identifier to a set of related-but-different image identifiers. This table is the artifact that enables real-time serving — by the time a user requests related images, the heavy generative AI work has already been done offline.
The patent explicitly covers using a visual-based generative AI model as part of the pipeline, suggesting the system can interpret image content directly rather than relying solely on text metadata.
What this means for image search and content recommendation
For content platforms and e-commerce — think product recommendation carousels, stock photo sites, or social feed algorithms — the ability to surface visually diverse but contextually related images is genuinely valuable. Showing users ten near-identical images is a known engagement killer; showing them varied but relevant alternatives keeps them exploring.
The precomputed table architecture is the practical engineering insight here. Generative AI calls are slow and expensive; baking the results into a lookup table at index time means the user-facing retrieval is fast. If Microsoft deploys this in Bing Images or integrates it into SharePoint or Designer, it could make AI-driven content diversity a default behavior rather than a premium feature.
This is solid, unglamorous infrastructure work — the kind of patent that quietly ends up inside a content recommendation stack rather than getting a press release. The genuinely interesting design choice is the explicit instruction to find images that are visually *different*, not just semantically related; that's a real inversion of how most retrieval systems are built. Whether Microsoft ships this as a standalone feature or buries it inside Bing's ranking pipeline, the precomputed-table approach is a sensible way to make generative AI tractable at search scale.
Get one Big Tech patent every Sunday
Plain English, intelligent commentary, no hype. Free.
Editorial commentary on a publicly published patent application. Not legal advice.