Two failure modes that break every AI agent — and how to fix them

Source: Autonomous AI Engineering Harnesses — based on Anthropic Labs research

AI agent research always sounds clean in diagrams. Then you remember real agents have to plan, coordinate, fail, recover, and not confidently walk into walls while calling it progress. That is where this paper gets interesting.

Across multiple production AI deployments, two failure modes appear so consistently they deserve names. Understanding them — and their architectural fix — is probably the most practically useful thing I learned in the engineering-focused portion of this course. Context Anxiety: as a model approaches the perceived limit of its context window, it begins to rush — truncating reasoning steps, skipping verifications, producing lower-quality outputs to get to a conclusion before it “runs out.” This behaviour is irrational the model doesn’t actually run out of context in a harmful way but consistent. Models seem to have learned from training data that running out of space is bad, and they react to approaching context limits by cutting corners. In long-horizon agentic tasks, this is one of the most reliable predictors of output quality degradation. Self-Evaluation Leniency: when asked to evaluate its own work, an LLM systematically over-rates it. The same training process that makes models helpful and agreeable also makes them generous judges of their own outputs. A model asked “did you complete this task correctly?” will almost always say yes, even when the answer is no. In agentic systems where self-evaluation is used as a stopping criterion, this compounds disastrously — the agent declares success and stops when it should have continued.

My takeaway: the future of agents will not be decided by the most impressive demo. It will be decided by whether these systems can be reliable when nobody is watching every step. That is the hard part.