Google Patents an AI That Runs Full-Strength on a Phone Chip
Google is patenting a neural network blueprint that does more with less — squeezing strong image-recognition performance out of architectures thin enough to run on a smartphone chip.
What Google's efficient neural network design actually does
Imagine trying to move furniture through a narrow hallway. Most neural networks work by expanding information into huge, wide layers before squishing it back down — which is expensive in both memory and processing power. Google's design flips that idea on its head.
Instead of starting wide and going narrow, this architecture keeps the entry and exit points of each processing block deliberately thin, while temporarily expanding in the middle only where the actual heavy-lifting computation happens. Think of it like packing your suitcase more efficiently: you spread things out to sort them, then compress them back down before closing the lid.
The result is a network that can be deployed on devices with limited memory and battery life — like smartphones, tablets, or wearables — without needing a data center to back it up. You get capable AI inference running locally, without draining your battery or requiring a cloud round-trip.
How inverted residual blocks cut compute without losing accuracy
The patent describes a convolutional neural network (CNN) architecture built around two key structural ideas.
The first is a linear bottleneck layer — a narrow layer that compresses feature representations without applying a nonlinear activation function (which would otherwise destroy useful information at low dimensionality). Placing these thin layers at the input and output of each processing block preserves information while keeping memory use low.
The second is an inverted residual block. Traditional residual networks (like ResNet) connect wide layers with a shortcut and use narrow bottlenecks only in the middle. Google's design inverts this: the shortcut connection runs between the two thin bottleneck layers, while the middle of the block expands into a wider representation for computation. The expansion uses depthwise separable convolutions — a technique that splits a standard convolution into two cheaper operations (one per channel, one across channels), drastically reducing the number of multiplications required.
Together, these ideas form the backbone of what Google previously released publicly as MobileNetV2. This patent formalizes that architecture with claims covering the structural combination of linear bottlenecks and inverted residuals in a single convolutional block.
What this means for AI running directly on your phone
The practical payoff here is efficient on-device inference — AI that runs on your phone without phoning home. Architectures like this underpin Google's real-time camera features, object detection in Google Lens, and on-device speech processing. The thinner the network, the faster it runs and the less it taxes your battery.
This is also a competitive moat play. As AI moves increasingly to the edge — meaning your device, not a server — whoever has the best low-power inference architecture has a durable advantage. Google shipping this as a patent, years after MobileNetV2's public release, is largely a defensive IP move to formally claim the design space it already pioneered.
This is MobileNetV2 in patent form — a genuinely important architecture that Google already published as research in 2018 and has been shipping in products for years. The technical contribution is real and well-understood by the ML community; the patent filing is mostly a legal formalization of prior art Google already owns. If you're not tracking Google's AI efficiency IP portfolio, this is worth a bookmark, but it's not a new development.
Get one Big Tech patent every Sunday
Plain English, intelligent commentary, no hype. Free.
Editorial commentary on a publicly published patent application. Not legal advice.