I thought more AI agents meant better results. The research says otherwise.

Source: CooperBench — Khatua, Zhu et al., Stanford & SAP Labs, arXiv 2601.13295

I have learned to be suspicious of agent demos that look too smooth. The real question is not whether an agent can do one impressive thing once. It is whether the system still behaves when the task becomes messy.

A few weeks ago I was reading through a paper from Stanford and SAP Labs, and one number stopped me: 30%. That’s how much worse AI agents perform when they work together compared to working alone. The study is CooperBench — published January 2026 — and the researchers ran over 600 collaborative coding tasks across 12 libraries and 4 programming languages. They took the best AI coding agents available, paired them up on tasks that required coordination, and measured what happened. The result they named the “curse of coordination.” Three things kept going wrong. First, agents sent each other vague, mistimed, and inaccurate messages — essentially talking past each other. Second, even when they communicated clearly, agents didn’t follow through on what they’d committed to. Third, agents operated on completely wrong assumptions about what the other agent was doing, and neither noticed until something broke. Compare this to human teams, where adding a teammate typically improves productivity. With AI agents, 1+1 is currently less than 1. The practical implication for anyone building AI systems: before adding more agents, ask whether you have the coordination infrastructure to make them work together reliably. Most AI system designs treat multi-agent coordination as a solved problem.

For me, this is the practical lesson: autonomy is not one feature. It is planning, memory, coordination, recovery, and judgment stitched together. Naturally, all the difficult bits are the important ones.