Google · Filed Jul 22, 2025 · Published May 28, 2026 · verified — real USPTO data

Google Patents a 3D-Aware Feature Matching System for Object Recognition

Matching a photo of an object taken from the front to one taken from the side is surprisingly hard — and Google's new patent tackles it by teaching an AI to think in 3D while looking at 2D images.

Google Patent: 3D-Augmented Image Feature Matching Explained — figure from US 2026/0148415 A1
FIG. 1A — rendered from the official USPTO publication PDF.
Publication number US 2026/0148415 A1
Applicant Google LLC
Filing date Jul 22, 2025
Publication date May 28, 2026
Inventors Arjun Karpur, Guilherme Mendeleh Perrotta, Ricardo Martin-Brualla, André Filgueiras de Araujo
CPC classification 382/103
Grant likelihood Medium
Examiner CENTRAL, DOCKET (Art Unit OPAP)
Status Docketed New Case - Ready for Examination (Feb 20, 2026)
Parent application is a National Stage Entry of PCTUS2023021513 (filed 2023-05-09)
Document 20 claims

What Google's 3D-augmented image matching actually does

Imagine you photograph a coffee mug straight on, then someone else snaps the same mug from a 45-degree angle. To a standard image-matching algorithm, those two photos look pretty different — the proportions shift, shadows move, and familiar edges disappear. That's a real problem for apps that need to recognize objects reliably across many viewpoints.

Google's patent describes a way to fix this by giving the AI a sense of 3D space. Instead of comparing raw pixel patches, the system first figures out where each interesting point on the object sits in three-dimensional space — using depth cues from the image itself. Those 3D coordinates then get baked into the feature descriptions the ML model generates.

The result is that when the model looks for matching points between two photos taken from different angles, it's comparing descriptions that already account for how the object is oriented in space — not just how it looks on a flat screen. That should make matches more accurate and more robust when the viewing angle changes dramatically.

How 3D coordinates get fused into the ML embedding pipeline

The system works in four main stages:

  • Local feature extraction: A model identifies interesting keypoints in each image — edges, corners, texture boundaries — the same kind of landmarks traditional computer-vision systems have used for decades.
  • 3D coordinate lifting: For each detected keypoint, the system estimates the corresponding position in a shared 3D reference frame tied to the object itself (not the camera). Think of this as figuring out "this corner of the mug is 3 cm to the left and 2 cm up from the center of the mug," regardless of where the camera is.
  • 3D-augmented embeddings: A machine learning model fuses the raw 2D image features and the 3D coordinates into a single vector representation — called an embedding (a compact numerical description the model can compare). Because both images' features are mapped to the same object-centric coordinate system, the embeddings become view-invariant.
  • Correspondence matching: The system compares the 3D-augmented embeddings across both images to find matching keypoints — telling you "this point in image A is the same physical spot as that point in image B."

The key insight is that lifting features into a shared 3D object space before comparing them removes a lot of the noise introduced by changing viewpoints, making the learned matching more reliable.

What this means for AR, visual search, and robotics

Feature matching is a foundational step in a surprising range of technologies: augmented reality overlays that must track objects as you walk around them, visual search that lets you identify a product from any angle, robot manipulation that needs to localize a part before grabbing it, and 3D scene reconstruction. Most of these tasks already work reasonably well when viewpoint changes are modest, but they degrade quickly at large angles — a known pain point.

For Google specifically, this connects naturally to products like Google Lens and ARCore, both of which depend on robust cross-view object recognition. If this approach ships, it could mean more reliable object ID even when you photograph something from an awkward angle — which is most of the time in the real world.

Editorial take

This is solid, incremental computer vision research rather than a conceptual leap — the idea of injecting 3D priors into 2D feature matching has been explored in academia, but Google is patenting a specific learned pipeline for doing it end-to-end. Given how central reliable object matching is to AR and visual search, this is exactly the kind of infrastructure work worth watching even if it'll never make a headline on its own.

Get one Big Tech patent every Sunday

Plain English, intelligent commentary, no hype. Free.

Source. Full patent text and figures from the official USPTO publication PDF.

Editorial commentary on a publicly published patent application. Not legal advice.