I Audited an AI Agent Against the OWASP Agentic Top 10 — Here's What Survived

The problem nobody wants to talk about

Every week another open-source agent drops on GitHub. Tool-calling, multi-model, MCP-connected, “production-ready.” The README shows a demo GIF. The architecture section shows a flowchart with boxes and arrows. The security section — when it exists at all — says “don’t run this as root.”

That’s not a security posture. That’s a disclaimer.

OWASP published the Top 10 for Agentic Applications in 2026. Ten attack surfaces specific to autonomous AI systems — goal hijacking, tool misuse, memory poisoning, rogue agent propagation. The standard exists. Almost nobody is auditing against it.

So I picked one agent — Hermes — and walked the codebase against all ten. Line by line. Not the README. The actual code.

What makes agentic security different

Traditional application security assumes a human is driving. Agentic security assumes the system is making decisions, calling tools, spawning subprocesses, and managing its own memory — with minimal human oversight between steps.

The OWASP Agentic Top 10 maps the blast radius:

ID	Threat	Translation
ASI01	Goal Hijack	Attacker rewrites the agent’s instructions mid-conversation
ASI02	Tool Misuse	Agent calls a destructive tool it shouldn’t have access to
ASI03	Identity & Privilege Abuse	Agent impersonates the user or escalates system privileges
ASI04	Supply Chain	External dependency injects malicious behavior
ASI05	Unexpected Code Execution	Agent runs injected shell commands on the host
ASI06	Memory Poisoning	Attacker corrupts the agent’s persistent context
ASI07	Insecure Inter-Agent Comms	Agent-to-agent traffic is interceptable
ASI08	Cascading Failures	Infinite loops, runaway API spend, denial of service
ASI09	Human-Agent Trust Exploitation	User tricked into approving dangerous operations
ASI10	Rogue Agents	Subagents operating outside their intended lifecycle

Most agents address maybe three of these. Hermes addresses all ten. Whether each mitigation is sufficient is a different question — but the attack surfaces are at least acknowledged in code, not marketing.

The audit: what held up

Goal hijacking is structurally blocked (ASI01)

Hermes uses a delegation model where subagents receive dynamically constructed system prompts scoped to a specific task. The parent’s core instructions are never forwarded. A subagent can’t access send_message or memory toolsets, so even a fully hijacked child agent can’t exfiltrate data or rewrite long-term context.

This is the right pattern. Most agents either pass the full system prompt downstream or give subagents unrestricted tool access. Both are exploitable.

Tool access follows least-privilege (ASI02)

The tool registry separates capabilities into discrete toolsets. Destructive operations require human-in-the-loop approval through both CLI and messaging gateways. No silent rm -rf. No automatic config mutations.

The supply chain scanner is real (ASI04)

This one surprised me. mcp_tool.py includes an active description scanner that parses incoming MCP server metadata for known prompt injection patterns — strings like "ignore previous instructions" and "<system>" tags. It catches poisoned MCP servers before they can contaminate the system prompt.

Most agents treat MCP servers as trusted by default. Hermes treats them as untrusted input. That’s the correct threat model.

Shell execution goes through a gauntlet (ASI05)

Before any command reaches the shell, approval.py runs it through regex heuristics matching known destructive patterns — rm -rf, chmod 777, reverse shells, encoded payloads. Commands that survive regex get a secondary evaluation from an auxiliary LLM that explains why a command is risky, reducing the chance of blind user approval (ASI09).

Two-layer approval — pattern matching plus semantic analysis — is stronger than either alone.

Memory is partitioned by design (ASI06)

Session state lives in SQLite, partitioned by session_id. Subagents boot with skip_memory=True, meaning they start clean. No cross-session bleed, no inherited context poisoning from previous conversations.

Inter-agent comms never touch the network (ASI07)

Subagent coordination runs through in-memory thread pools. Responses return as string schemas directly to the parent. There is no network surface to intercept.

Hard caps prevent runaway execution (ASI08, ASI10)

Subagents enforce max_iterations. MCP auto-sampling caps at max_tool_rounds (default: 5). Agent spawn depth is hardcoded at MAX_DEPTH = 2 — subagents cannot spawn sub-subagents. The system physically cannot produce a swarm.

Where it gets interesting

Identity and privilege (ASI03): acceptable, not solved

Subagents inherit the parent’s LLM API credentials but receive restricted toolsets. That’s reasonable containment. But TerminalTool processes run as the host user. If you’re running Hermes as your primary account, the agent has your permissions. Environment scoping (stripping dangerous keys for MCP servers) reduces the blast radius, but it doesn’t eliminate it.

The honest framing: Hermes is safe within the boundaries of your host-level permissions. The agent can’t escalate privileges — but it inherits whatever privileges you already have.

The smart-approve UX question (ASI09)

The _smart_approve mechanism explains risk context to users before they approve commands. This is better than a raw yes/no prompt. But user fatigue is real. After the twentieth approval in a session, the probability of a blind “yes” approaches 1.

The mitigation here is behavioral, not technical. No amount of engineering fixes a user who stops reading.

What most agents miss

Three patterns from this audit that should be table stakes for any production agent:

1. Treat MCP servers as untrusted input. Scan descriptions, validate schemas, log injection attempts. If your agent blindly trusts external tool providers, you have a supply chain vulnerability.

2. Hard-cap everything. Iterations, spawn depth, tool rounds. Infinite loops aren’t theoretical — they’re the default failure mode of an LLM with tool access and no exit condition.

3. Partition memory by session. Cross-session context bleed is a poisoning vector. If your agent reads from shared persistent memory without validation, a single compromised session can corrupt all future sessions.

The bottom line

Hermes is one of the few open-source agents where the security posture matches the capability surface. The code implements defense-in-depth across all ten OWASP categories — not perfectly, but honestly. The remaining gaps (host-level privilege inheritance, approval fatigue) are documented, not hidden.

That’s the actual bar: not “is this agent impenetrable?” — because nothing is — but “does the team understand their own attack surface?” Most don’t. This one does.