I've been talking to founders building AI agents across fintech, devtools, and
productivity – and almost none of them have any real security layer. Their agents
read emails, call APIs, execute code, and write to databases with essentially no
guardrails beyond "we trust the LLM."
So I built AgentArmor: an open-source framework that wraps any agentic
architecture with 8 independent security layers, each targeting a distinct attack
surface in the agent's data flow.
The 8 layers:
L1 – Ingestion: prompt injection + jailbreak detection (20+ patterns, DAN,
extraction attempts, Unicode steganography)
L2 – Storage: AES-256-GCM encryption at rest + BLAKE3 integrity for vector DBs
L3 – Context: instruction-data separation (like parameterized SQL, but for
LLM context), canary tokens, prompt hardening
L4 – Planning: action risk scoring (READ=1 → DELETE=7 → EXECUTE=8 → ADMIN=10),
chain depth limits, bulk operation detection
L5 – Execution: network egress control, per-action rate limiting, human
approval gates with conditional rules
L6 – Output: PII redaction via Microsoft Presidio + regex fallback
L7 – Inter-agent: HMAC-SHA256 mutual auth, trust scoring, delegation depth
limits, timestamp-bound replay prevention
L8 – Identity: agent-native identity, JIT permissions, short-lived credentials
I tested it against all 10 OWASP ASI (Agentic Security Integrity) risks from
the December 2025 spec. The red team suite is included in the repo.
Works as: (a) a Python library you wrap around tool calls, (b) a FastAPI proxy
server for framework-agnostic deployment, or (c) a CLI for scanning prompts in CI.
Integrations included for: LangChain, OpenAI Agents SDK, MCP servers.
I ran it live with a local Ollama agent (qwen2:7b) – you can watch it block a
`database.delete` at L8 (permission check), redact PII from file content at L6,
and kill a prompt injection at L1 before it ever reaches the model.
GitHub: https://github.com/Agastya910/agentarmor
PyPI: pip install agentarmor-core
Would love feedback, especially from people who have actually built production
agents and hit security issues I haven't thought of.
TAGS: security, python, llm, ai, agents
One thing I noticed digging through the code though — L4 risk scoring categorizes actions purely by verb. _categorize_action parses the action string for keywords like "read" or "delete" but never looks at params. So read.file targeting /etc/shadow gets a risk score of 1, while delete.file on /tmp/cache.json scores 7. In real agent workloads the target matters as much as the verb — feels like the policy engine could bridge this gap with param-aware rules, since the condition evaluator already supports params.* field resolution.
Also noticed TrustScorer takes a decay_rate in __init__ but never actually applies time-based decay anywhere — trust only changes on interactions. So an agent that was trusted six months ago and went dormant still walks back in with the same score. Small thing but could matter in long-running multi-agent setups.
The MCP rug-pull detection is the standout feature for me. Cross-referencing tool names against their descriptions to catch things like "safe_search" that actually calls exec — haven't seen that anywhere else. With how fast MCP is getting adopted this could get real traction.