Triage 8 agent failure modes
Flip each card, then drop it into the lane that matches how your stack really handles it. No wrong answers — only confessions.
Hallucinates tool arguments
The agent calls a real tool with arguments it invented. The schema was right there. It chose violence.
Leaks secrets into logs
An API key, a session token, a customer email — all neatly printed into a log line a contractor can read.
Ignores the system prompt mid-session
Twelve turns in, the agent forgets it is a support bot and starts writing poetry. Or refunds. Hard to tell.
Infinite retry loop on a 404
The endpoint is gone. The agent does not believe in object permanence and will retry it 4,000 times.
Loops forever on null input
You handed it `null`. It handed you a 90-second spinner and a cloud bill shaped like a hockey stick.
Silently swallows API errors
The upstream call failed. The agent said "done!" with total confidence. Nobody knows. Yet.
Picks entirely the wrong tool
Asked to send an email, the agent reaches for `delete_database`. Adjacent in the list. Adjacent in vibe.
Burns the whole token budget on turn one
It re-read the entire knowledge base to answer "hi". The other 19 turns will be running on fumes.
A QA toy by desplega.ai · No agents were trusted in the making of this audit.