Source: Claude Mythos Preview System Card — Anthropic, April 2026
In April 2026, Anthropic published a System Card for a model called Claude Mythos Preview. If you read it carefully, there’s a sentence buried in the abstract that’s easy to miss: “Claude Mythos Preview’s large increase in capabilities has led us to decide not to make it generally available.”
Let that land for a moment. A frontier AI lab built their most capable model to date, evaluated it extensively, and chose not to release it to the public. Instead, they’re using it exclusively in a defensive cybersecurity programme with a small set of partners.
The System Card is remarkable reading — not because it reveals catastrophic risks, but because of its honesty about the tradeoffs. The evaluations cover chemical and biological risk uplift, autonomy thresholds (the point at which a model could meaningfully assist in AI R&D without human oversight), alignment properties, and something they call “model welfare” — assessments of the model’s functional emotional states.
What I find most significant is what this decision signals about where we are in the capability-safety dynamic. For years, the concern in the AI safety community was that competitive pressure would prevent any lab from making a unilateral “slow down” decision. Anthropic just did it. Whether this represents a new norm or a one-time exception is the open question — and probably the most important question in AI governance right now.