Why LLMs are stuck — the case for world models

Source: The Blueprint of Reality — MM6451 study material

I started this one expecting a technical improvement. The more interesting part is what the improvement reveals about where current AI systems are still awkward, expensive, or surprisingly fragile.

Hans Moravec observed in the 1980s that tasks easy for humans walking, catching a ball, recognising objects were hard for AI, while tasks hard for humans chess, arithmetic, medical diagnosis were relatively easy for AI. This paradox turns out to be a deep structural feature of how current large language models work. LLMs learn statistical correlations in text. They know the word “gravity” in every possible context. What they can’t do is understand what happens when you push a cup off a table, because understanding that requires a grounded model of physical causation, not a statistical model of text co-occurrence. Scaling text training data doesn’t fix this. You can’t learn physics from descriptions of physics. World models — internal simulations that represent the physical properties, causal structure, and temporal dynamics of the world — are the proposed solution. The theoretical basis goes back to Kenneth Craik’s 1943 work: intelligent behaviour depends on maintaining a small-scale model of the world and using it to simulate possible futures before acting. The research direction this points toward — combining the language understanding of LLMs with the physical grounding of world models — is one of the most active areas of AI research in 2026.

In plain English, that is why the result matters beyond the chart. It changes where people should look, what they should question, and which comfortable assumption probably needs to be retired.

So the question is not whether the method sounds clever. Many things sound clever in AI. The question is whether it removes a real bottleneck once the system leaves the paper and meets the world.