Triage 8 agent failure modes

Flip each card, then drop it into the lane that matches how your stack really handles it. No wrong answers — only confessions.

CAUGHT IN QA0

A test catches it before a user ever does.

SHIPPED0

It reached prod — but you have a guardrail for it.

YOLO IN PROD0

It ships untested and you pray to the uptime gods.

Cards placed: 0/8Honesty is the only scoring input.

Hallucinates tool arguments

The agent calls a real tool with arguments it invented. The schema was right there. It chose violence.

Leaks secrets into logs

An API key, a session token, a customer email — all neatly printed into a log line a contractor can read.

Ignores the system prompt mid-session

Twelve turns in, the agent forgets it is a support bot and starts writing poetry. Or refunds. Hard to tell.

Infinite retry loop on a 404

The endpoint is gone. The agent does not believe in object permanence and will retry it 4,000 times.

Loops forever on null input

You handed it `null`. It handed you a 90-second spinner and a cloud bill shaped like a hockey stick.

Silently swallows API errors

The upstream call failed. The agent said "done!" with total confidence. Nobody knows. Yet.

Picks entirely the wrong tool

Asked to send an email, the agent reaches for `delete_database`. Adjacent in the list. Adjacent in vibe.

Burns the whole token budget on turn one

It re-read the entire knowledge base to answer "hi". The other 19 turns will be running on fumes.

A QA toy by desplega.ai · No agents were trusted in the making of this audit.

Triage 8 agent failure modes

Flip each card, then drop it into the lane that matches how your stack really handles it. No wrong answers — only confessions.

CAUGHT IN QA0

A test catches it before a user ever does.

SHIPPED0

It reached prod — but you have a guardrail for it.

YOLO IN PROD0

It ships untested and you pray to the uptime gods.

Cards placed: 0/8Honesty is the only scoring input.

Hallucinates tool arguments

The agent calls a real tool with arguments it invented. The schema was right there. It chose violence.

Leaks secrets into logs

An API key, a session token, a customer email — all neatly printed into a log line a contractor can read.

Ignores the system prompt mid-session

Twelve turns in, the agent forgets it is a support bot and starts writing poetry. Or refunds. Hard to tell.

Infinite retry loop on a 404

The endpoint is gone. The agent does not believe in object permanence and will retry it 4,000 times.

Loops forever on null input

You handed it `null`. It handed you a 90-second spinner and a cloud bill shaped like a hockey stick.

Silently swallows API errors

The upstream call failed. The agent said "done!" with total confidence. Nobody knows. Yet.

Picks entirely the wrong tool

Asked to send an email, the agent reaches for `delete_database`. Adjacent in the list. Adjacent in vibe.

Burns the whole token budget on turn one

It re-read the entire knowledge base to answer "hi". The other 19 turns will be running on fumes.

A QA toy by desplega.ai · No agents were trusted in the making of this audit.

Agent QA Audit — is your AI agent actually tested?

Triage 8 agent failure modes

Hallucinates tool arguments

Leaks secrets into logs

Ignores the system prompt mid-session

Infinite retry loop on a 404

Loops forever on null input

Silently swallows API errors

Picks entirely the wrong tool

Burns the whole token budget on turn one

Triage 8 agent failure modes

Hallucinates tool arguments

Leaks secrets into logs

Ignores the system prompt mid-session

Infinite retry loop on a 404

Loops forever on null input

Silently swallows API errors

Picks entirely the wrong tool

Burns the whole token budget on turn one