Samsung Patents a Near-Memory System for Faster Deep Learning Training
Samsung is filing patents that push AI training intelligence directly into memory hardware — reducing the back-and-forth between a CPU/GPU and RAM that quietly eats up training time and energy.
What Samsung's near-memory AI training actually does
Imagine you're studying for an exam using flashcards. Normally, someone hands you a random pile, you study it, and then tells a librarian which cards you struggled with — and the librarian has to schlep across the building to swap in new ones. That's roughly how most AI training works today: data goes from storage, to memory, to the processor, and feedback travels all the way back.
Samsung's patent proposes moving more of that feedback loop into the memory hardware itself. A special chip called a PNM (Processing Near Memory) device sits close to the data — it prefetches upcoming training batches in the background while the current one is being processed, and it receives back a loss value or confidence score for each data sample after training.
Those scores tell the system which examples the AI model found hard or easy. By having the PNM device hold onto that information, it can make smarter decisions about what to load next — without the main processor doing all that coordination work.
How the PNM device manages batches and training feedback
The patent describes a training loop where a PNM device — a memory module with some on-chip processing capability — takes on two jobs that traditionally fall to the host CPU or GPU.
First, the PNM memory prefetches upcoming data batches from either an internal PNM storage (think: fast NAND flash co-located with the memory chip) or an external storage device. This happens in parallel with the current batch being trained, hiding latency.
Second, after each batch is processed, the host system sends back a training result for every data sample — specifically a loss value (how wrong the model's prediction was) or a confidence score (how certain the model was). The PNM device stores these per-sample scores, which can then inform techniques like curriculum learning or importance sampling — methods where you feed the model harder or more informative examples more often, instead of cycling through data randomly.
The key architectural insight is that this feedback loop lives in the memory subsystem rather than requiring round-trips to the host processor. That means the logic for deciding what to load next can happen closer to where the data lives, reducing data-movement overhead — which is one of the dominant costs in large-scale deep learning training.
What this means for AI training hardware efficiency
Data movement — shuttling tensors and labels between storage, memory, and compute — can account for a significant slice of energy and time in large training runs. By offloading prefetching and sample-prioritization bookkeeping to a PNM device, Samsung's approach aims to shrink that overhead without requiring changes to the model architecture itself.
For Samsung, this fits squarely into its broader push in Processing-in-Memory (PIM) and near-memory compute — a hardware category it has been investing in heavily. If this kind of architecture ships in future HBM or LPDDR products, AI server and edge-device customers could get meaningfully better training throughput per watt, which is the metric that increasingly matters as training costs scale.
This is a solid systems-level patent, not a headline-grabbing AI algorithm. It's the kind of infrastructure work that compounds quietly — if Samsung can make memory hardware that natively participates in training feedback loops, it becomes a more compelling supplier to hyperscalers and on-device AI vendors alike. Worth tracking as part of Samsung's broader PIM roadmap, even if the patent itself won't win any glamour awards.
Get one Big Tech patent every Sunday
Plain English, intelligent commentary, no hype. Free.
Editorial commentary on a publicly published patent application. Not legal advice.