← Back to all posts
Research Journey

DeepSeek's Engram — the architecture that changes how LLMs remember

June 5, 2026

Source: Conditional Memory via Scalable Lookup — Cheng et al., DeepSeek-AI & Peking University, arXiv 2601.07372

There’s a specific inefficiency in every current large language model that DeepSeek researchers decided to fix, and the result is one of the more interesting architectural papers of early 2026. The problem: Transformers have to use expensive multi-layer neural computation to “remember” static facts — named entities, common phrases, formulaic patterns — that could be retrieved far more efficiently via a simple lookup table.

Engram introduces conditional memory as a complementary architecture primitive: N-gram embedding lookup tables that retrieve static knowledge in O(1) time, working alongside the existing Mixture-of-Experts computation. The result of a 27B Engram model versus an equivalent pure-MoE baseline: +3.4 MMLU, +5.0 BBH, +3.0 HumanEval. The reasoning gains are larger than the knowledge gains — by freeing early layers from static reconstruction, you get more effective depth for complex reasoning.

The long-context improvement is also striking: Multi-Query NIAH retrieval improves from 84.2 to 97.0. When the model stops wasting attention on local static facts, it has more capacity to track global context.

This is a real architectural insight, not an incremental tuning result. The U-shaped scaling law the paper reveals suggests an optimal allocation between neural compute and static memory at each model size — and at large scales, even simple lookup mechanisms dramatically improve performance when treated as a first-class primitive.