DeepSeek V4 — 1 million tokens at 10% of the cost

Source: DeepSeek-V4 — DeepSeek-AI technical paper

Technical AI papers can look intimidating from the outside. Lots of architecture names, benchmark tables, and acronyms behaving like they pay rent. But the useful question is usually simple: what bottleneck is this trying to remove?

DeepSeek V4 is simultaneously a technical achievement and a strategic statement. The technical achievement: two MoE models V4-Pro at 1.6 trillion parameters, V4-Flash at 284 billion that support 1 million token context windows at roughly 10% of the KV cache memory required by V3 at the same context length. The strategic statement: this capability, previously only available in expensive closed-source models, is open-sourced. Three architectural innovations drive the efficiency. First, hybrid attention: Compressed Sparse Attention CSA for long-range dependencies and Heavily Compressed Attention HCA for local context. Second, Manifold-Constrained Hyper-Connections mHC, which constrain residual connections to improve gradient flow. Third, the Muon optimiser, which achieves faster convergence than standard Adam. The 1M token context window matters because it changes the category of applications that become practically feasible. Long legal documents, research corpora, extended conversation histories — these move from technically possible but expensive to routinely affordable. The Goldman Sachs analysis noted that Chinese model competition is intensifying rapidly — Kimi K2.6, Qwen3.6-Max, Tencent Hy3, Xiaomi V2.5 all launched in proximity to V4. The open-source question — whether V4’s open release compresses the pricing power of closed-source alternatives — is the central question for the AI infrastructure investment thesis over the next 12–18 months.

In plain English, that is why the result matters beyond the chart. It changes where people should look, what they should question, and which comfortable assumption probably needs to be retired.

My takeaway: architecture matters because it decides what kind of intelligence is affordable, reliable, and usable outside the lab. The benchmark is the beginning of the story, not the ending.