Is there really an AI oversupply? The data says it's more complicated.

Source: Does Token Oversupply Change? — DBAI class discussion

At first this sounds like a technical finance question. Then you look closer and realise it is also a question about attention, incentives, and whether people are using information or just being impressed by it.

One of the more practically useful class discussions this semester was about the token oversupply claim: that only 15–18% of available AI inference capacity is being used. If true, this has enormous implications for GPU pricing, data centre economics, and the AI infrastructure investment thesis. The methodology critique is important. The claim assumes all 9–10.5 million H100-equivalent GPUs globally are running inference workloads 24/7 at theoretical peak throughput of 1,400 tokens per second. In reality, large fractions of those GPUs are used for training, fine-tuning, and R&D. Real-world inference throughput varies enormously. And the consumption figure likely undercounts internal enterprise inference, research usage, and non-API tokens. The deeper question: whether any level of current utilisation tells you anything meaningful about future demand. The history of compute markets suggests it probably doesn’t. Mainframe utilisation looked low before personal computing arrived. Bandwidth utilisation looked low before streaming video arrived. The question isn’t whether today’s capacity is fully used — it’s whether the applications that will consume it exist yet. My intuition, based on the agentic AI trajectory, is that the demand side is just getting started.

In plain English, that is why the result matters beyond the chart. It changes where people should look, what they should question, and which comfortable assumption probably needs to be retired.

So I would not read this as a neat technology story. It is a messy information story. The tools improve, but people still have to decide what deserves attention. Annoying, but true.