← Back to all posts
Research Journey

We finally have a rigorous definition of AGI. Here's what it reveals.

June 5, 2026

Source: A Definition of AGI — Hendrycks et al., arXiv 2510.18212

The AGI debate has always been stuck on the same problem: nobody agrees on what it means. Every time AI masters a task — chess, bar exam, image recognition — the definition shifts upward. It’s an endlessly moving goalpost, and it makes productive conversation nearly impossible.

A paper from the Center for AI Safety and collaborators across MIT, Stanford, Oxford, and others takes a serious crack at fixing this. They ground their definition in Cattell-Horn-Carroll (CHC) theory — the most empirically validated framework for human cognition, built over a century of psychometric research. AGI, by their definition, means matching the cognitive versatility and proficiency of a well-educated adult across ten core domains: reasoning, memory, language, perception, and six others.

The results are humbling in both directions. GPT-4 scores 27%. GPT-5 scores 57%. That’s extraordinary progress — more than doubling in one model generation. But the profile is “jagged”: the models score very high on knowledge retrieval and language tasks, and very low on long-term memory storage and certain kinds of grounded perception.

The long-term memory finding is the one I keep returning to. It’s not a small gap — it’s a fundamental architectural limitation. Current models have no persistent memory across contexts unless explicitly engineered. A well-educated adult carries decades of integrated experience into every decision. That’s not a benchmark problem. That’s a design problem.