The four ways AI agents can harm you — a red team taxonomy

Source: Autonomous Agent Security — Red Team Study Guide

This is one of those risks that sounds abstract until you imagine explaining it after the fact. Then it suddenly becomes very concrete, very expensive, and very difficult to hide behind a slide deck.

When people talk about AI safety risks, the conversation tends to collapse into either “it’ll take our jobs” or “it’ll become Skynet.” Neither of those frames is particularly useful for thinking about the real, immediate risks from autonomous AI agents deployed today. A structured threat taxonomy is more useful. The red team study guide I worked through this semester organises autonomous agent threats into four quadrants. The first is unauthorised delegation: agents that comply with instructions from people who aren’t their legitimate operators — because an adversary posed as an authority, or because the agent has no reliable way to verify identity. The second is data and privacy breaches: agents that disclose sensitive information through indirect, contextually misleading requests. The third quadrant is system degradation: unbounded resource consumption, storage loops, denial-of-service. These are less dramatic but potentially more common — an agent in an infinite loop can take down a system just as effectively as a deliberate attack. The fourth, and most concerning for long-term deployment, is multi-agent amplification: when a compromised agent propagates that compromise to other agents in a network. The key escalation the taxonomy highlights: these aren’t chatbot problems. They’re Level 2 autonomy problems — the difference between a system that describes actions and one that executes irreversible system-level changes.

My takeaway: the danger is rarely the dramatic thing in the headline. It is the quiet gap between knowing a risk exists and assigning someone to do something about it. Very unglamorous. Very important.