AI Agent Failure Modes
A field guide to common AI agent failures, the controls that reduce them, and the evidence reviewers need before launch, rollout, or incident review.
Guide
A structured process for reviewing failed AI agent runs and turning traces into controls, fixtures, owners, follow-up tests, and safer workflows.
Agent incidents should be reviewed from evidence, not from the final answer alone. The useful record includes the input, plan, retrieved sources, tool calls, approvals, errors, retries, and final output. Without that trace, the team can only guess whether the agent misunderstood, used the wrong source, exceeded permission, or hid uncertainty.
The goal of an incident review is a control change. A better prompt may be part of the fix, but it is rarely enough. Good reviews turn the failure into a fixture, update a permission or validation rule, and add an owner for follow-up. The process below pairs with the agent observability guide and the AI agent failure modes checklist.
Informal reviews usually stop at “the model got it wrong.” That is not actionable. The model may have followed bad instructions, used stale retrieval, accepted malicious source text, called the right tool with bad arguments, skipped a no-answer path, or performed an action it should not have been allowed to perform.
If the review does not identify the failed control, the same class of incident will return under a different prompt. Agent systems need post-incident learning that looks more like software reliability than prompt tuning.
Create one incident document per failed run or cluster of related runs. Record the timestamp, workflow, owner, trigger, user impact, affected data or system, current status, and whether any production action needs rollback. Link the trace, artifacts, user report, and any human edits made after the run.
Then classify severity. A harmless wrong draft is different from a customer-visible email, a privacy exposure, a destructive tool call, or an unsupported recommendation used by a decision-maker. Severity should be based on impact and recoverability, not on how surprising the output looked.
Read the trace in sequence. What did the user ask? How did the agent normalize the task? Which sources were retrieved? Were they current, relevant, and allowed? Which tools were called? Were arguments validated? Did the agent retry? Did it encounter errors? Did a human approve the action? Did the final answer cite evidence?
Separate observed facts from guesses. If the logs do not show why the agent chose a tool, write “not observable” instead of inventing a reason. That gap is itself a finding.
Most incidents map to one or more control failures. Retrieval control failed when the agent used irrelevant, stale, or incomplete sources. Permission control failed when the agent could perform an action that should have required approval. Validation failed when a malformed or broad tool call executed. No-answer control failed when the system guessed instead of escalating. Observability failed when reviewers could not reconstruct the run.
Use the agent permission design guide to decide whether the action class was too broad. Use the RAG no-answer testing approach when the incident involved missing evidence.
Every incident needs at least one concrete corrective action. Examples include adding a fixture, blocking a tool argument pattern, narrowing retrieval sources, requiring approval for a permission class, adding a no-answer rule, improving citations, adding trace fields, or changing the final response schema.
Assign an owner and due date. If the fix cannot be validated, it is not ready. The regression fixture should reproduce the original failure or a smaller version of it. Add the fixture to the same evaluation surface used for launch testing, such as the LLM evaluation framework.
An incident review is complete only when the team can answer six questions. What happened? Who or what was affected? Which control failed? What changed? How will the same class of failure be detected? Who will verify the fix?
After the fix, rerun the incident fixture and a few neighboring fixtures. Neighboring tests matter because a narrow prompt patch can fix one example while breaking related behavior.
An AI agent incident review should produce a failure class, root cause, changed control, regression fixture, owner, and follow-up review date.
No. The review should identify the system control that failed, such as retrieval, permissions, validation, logging, approval, or escalation.
Use the incident template when an agent run fails in review or production. The review is successful when it improves the system’s controls and gives the team a repeatable test, not when it produces a longer explanation of the mistake.
Reusable resource: Download incident template