Source: Autonomous Agent Security — Red Team Study Guide
When people talk about AI safety risks, the conversation tends to collapse into either “it’ll take our jobs” or “it’ll become Skynet.” Neither of those frames is particularly useful for thinking about the real, immediate risks from autonomous AI agents deployed today. A structured threat taxonomy is more useful.
The red team study guide I worked through this semester organises autonomous agent threats into four quadrants. The first is unauthorised delegation: agents that comply with instructions from people who aren’t their legitimate operators — because an adversary posed as an authority, or because the agent has no reliable way to verify identity. The second is data and privacy breaches: agents that disclose sensitive information through indirect, contextually misleading requests.
The third quadrant is system degradation: unbounded resource consumption, storage loops, denial-of-service. These are less dramatic but potentially more common — an agent in an infinite loop can take down a system just as effectively as a deliberate attack. The fourth, and most concerning for long-term deployment, is multi-agent amplification: when a compromised agent propagates that compromise to other agents in a network.
The key escalation the taxonomy highlights: these aren’t chatbot problems. They’re Level 2 autonomy problems — the difference between a system that describes actions and one that executes irreversible system-level changes. That distinction needs to be at the centre of every enterprise AI deployment conversation.