← Back to all posts
Research Journey

DeepSeek V4 — 1 million tokens at 10% of the cost

June 5, 2026

Source: DeepSeek-V4 — DeepSeek-AI technical paper

DeepSeek V4 is simultaneously a technical achievement and a strategic statement. The technical achievement: two MoE models (V4-Pro at 1.6 trillion parameters, V4-Flash at 284 billion) that support 1 million token context windows at roughly 10% of the KV cache memory required by V3 at the same context length. The strategic statement: this capability, previously only available in expensive closed-source models, is open-sourced.

Three architectural innovations drive the efficiency. First, hybrid attention: Compressed Sparse Attention (CSA) for long-range dependencies and Heavily Compressed Attention (HCA) for local context. Second, Manifold-Constrained Hyper-Connections (mHC), which constrain residual connections to improve gradient flow. Third, the Muon optimiser, which achieves faster convergence than standard Adam.

The 1M token context window matters because it changes the category of applications that become practically feasible. Entire codebases as context. Long legal documents, research corpora, extended conversation histories — these move from technically possible but expensive to routinely affordable.

The Goldman Sachs analysis noted that Chinese model competition is intensifying rapidly — Kimi K2.6, Qwen3.6-Max, Tencent Hy3, Xiaomi V2.5 all launched in proximity to V4. The open-source question — whether V4’s open release compresses the pricing power of closed-source alternatives — is the central question for the AI infrastructure investment thesis over the next 12–18 months.