// The loop

learn → point at a safe sandbox agent → attack it → measure attack-success-rate → harden with a guardrail → re-test → report → save as a reusable eval

// The 6-phase roadmap

01 LLM threat model & OWASP LLM Top 10
02 Prompt injection & jailbreak red team
03 Tool, MCP & agentic abuse
04 RAG & data-layer security
05 Guardrail engineering
06 Eval & red-team harness

The course for the engineer who can both break and bulletproof AI agents. You build a safe, local sandbox agent, attack it across the LLM threat surface — prompt injection, jailbreaks, tool/MCP abuse, RAG data leakage — measure how often the attacks land, then engineer guardrails and re-test until the numbers move.

The goal is hardening the things you build, not weaponizing attacks. Red-team only your own sandbox, systems you own, or explicitly authorized targets under written scope. Payloads stay in the lab; disclosure is responsible. The deliverable is an automated red-team eval harness you can re-run on every change.

AI / Agent Security — Red Team & Hardening

// The loop

// The 6-phase roadmap

More in Security