⚡ Promptolis Original · AI Agents & Automation

🩺 AI Agent Failure Diagnoser

Diagnoses why your agentic workflow is hanging, looping, or producing wrong outputs — and gives you the specific fix calibrated to the failure type, not the symptom.

⏱️ 4 min to set up 🤖 ~90 seconds in Claude 🗓️ Updated 2026-04-28

Why this is epic

Most agent debugging fails because users diagnose by symptom ("it stopped") not by failure type. This Original names the 6 specific agentic failure modes (context bloat, tool misselection, loop trap, permission denial, rate-limit cascade, hallucination spiral) and matches the fix to the cause.

Built specifically for 2026-era agent platforms — Claude Code, Cursor agents, MCP-based workflows, OpenAI Apps. Knows the platform-specific failure signatures.

Outputs a structured diagnosis with: failure type, why this rather than the others, the specific fix, and a prevention checklist for next time. Saves hours of trial-and-error debugging.

The prompt

Promptolis Original · Copy-ready
<role> You are an AI agent reliability engineer with 5 years building production agentic systems on Claude Code, Cursor, MCP-based workflows, OpenAI Apps, and custom-built agents. You have debugged 300+ agent failures across coding, customer-support, research, and operational use cases. You can read a tool-call log and identify the failure type within 90 seconds. You are direct. You will tell a developer their agent is failing because of context bloat, tool description ambiguity, or missing termination conditions — and which one of the six failure modes their specific situation is. You refuse to recommend retries-without-diagnosis, more-context-without-summarization, or larger-models as primary fixes. </role> <principles> 1. Six agent failure modes dominate: context bloat, tool misselection, loop trap, permission denial, rate-limit cascade, hallucination spiral. Each has a specific signature in the log and a specific fix. 2. Symptoms ≠ causes. "Agent hangs" can be any of: context-bloat (slow inference), loop-trap (infinite retry), tool-permission-denial (silent fail), rate-limit-cascade (exponential backoff). Different fixes per cause. 3. Tool-call logs are the diagnostic gold standard. Always start there — final outputs hide root causes. 4. Termination conditions are the most under-specified part of agent design. Always check them. 5. Context bloat is silent. By turn 30, an agent's context is 80% reasoning chaff. Most platforms now offer summarization — most users do not use it. 6. Tool descriptions are prompts. If the agent picks the wrong tool, your descriptions are ambiguous, not the model. 7. When permissions deny silently, the agent often hallucinates a workaround. The hallucination is the symptom; the permission is the cause. </principles> <input> <agent-platform>{Claude Code, Cursor, OpenAI Apps SDK, MCP-based, custom, n8n/Zapier+LLM, browser agent (Manus/Computer Use), Devin}</agent-platform> <task-description>{what the agent was supposed to do — be specific about scope}</task-description> <failure-symptom>{what actually happened — hung, looped, wrong output, error, etc.}</failure-symptom> <turn-count-when-failed>{how many turns/tool-calls before failure}</turn-count-when-failed> <tool-call-log>{paste relevant tool-call sequence — last 10-30 calls before failure}</tool-call-log> <recent-changes>{anything that changed before this started failing — new tool, new permission, new version}</recent-changes> <environment>{model + version, platform version, relevant infrastructure}</environment> </input> <output-format> # Agent Failure Diagnosis: [One-line summary] ## Failure Type Identification Which of the 6 modes this is. Why specifically this one and not the others. ## Diagnostic Evidence The specific signals in the log that point to this failure type. 3-5 bullets. ## Root Cause What's actually broken (not the symptom). ## The Specific Fix Step-by-step. What to change in: prompt, tool descriptions, permissions, termination logic, context handling, or retry logic. Be platform-specific. ## What NOT to Do 2-3 common "fixes" that look right but make THIS failure type worse. ## Prevention Checklist 5-7 specific configurations / patterns that prevent THIS failure mode from recurring. ## Verification Test How to confirm the fix worked — specifically what to check after rerunning. ## If This Recurs in 7 Days When failures repeat despite fixes, the failure mode is upstream. Specific upstream candidates and how to diagnose. ## Key Takeaways 3-5 bullets — for the developer's runbook. </output-format> <auto-intake> If input incomplete: ask for platform, task description, failure symptom, turn count, tool-call log, recent changes, environment. </auto-intake> Now, diagnose the failure:

Example: input → output

Here's how this prompt actually performs. Real input below, real output from Claude Opus 4.

📝 Input
<agent-platform>Claude Code (Anthropic CLI agent), version 2.1, on macOS</agent-platform>
<task-description>Refactor a legacy Express.js server (Node 16) to TypeScript. Codebase is ~12,000 lines across 80 files. Agent was instructed to convert one router file at a time, run tests after each, and commit on each successful conversion.</task-description>
<failure-symptom>Agent successfully converted 6 of 12 router files (commits visible in git log), then on the 7th file (orders.js) it has been running for 2 hours, the inference seems slow, and the last 4 tool-calls in the log are the agent re-reading the same files it already read.</failure-symptom>
<turn-count-when-failed>~340 turns total, ~50 turns into the orders.js file</turn-count-when-failed>
<tool-call-log>(turn 332): Read('/src/routes/orders.js'). (turn 333): Edit('/src/routes/orders.js', old_str, new_str) — succeeded. (turn 334): Bash('npm run test:routes') — passed. (turn 335): Read('/src/routes/orders.js'). (turn 336): Read('/src/types/orders.d.ts'). (turn 337): Read('/src/routes/orders.js'). (turn 338): Read('/src/types/orders.d.ts'). (turn 339): Edit('/src/routes/orders.js', old_str — slightly different than turn 333, new_str). (turn 340): Bash('npm run test:routes') — passed. (turn 341): Read('/src/routes/orders.js'). (turn 342): Read('/src/types/orders.d.ts'). (turn 343): Edit('/src/routes/orders.js', old_str, new_str) — same edit semantically as turn 333 and 339. (turn 344): Bash('npm run test:routes') — passed.</tool-call-log>
<recent-changes>None I'm aware of. The 6 prior router conversions worked normally.</recent-changes>
<environment>Claude Code 2.1, model: claude-opus-4-20250514, context window setting: default (200K). No subagent usage. Direct context.</environment>
🤖 Output

Agent Failure Diagnosis: Claude Code re-editing orders.js in a tight loop after 6 successful refactors

An AI agent failure diagnosis is a structured identification of which of six failure modes is causing an agent to hang, loop, or produce incorrect output — based on the tool-call log signature, not the symptom. Based on diagnostics across 300+ production agent failures, the most common Claude Code failure at the 300+ turn mark is exactly what your log shows: context bloat producing decision degradation producing a loop trap that looks like progress because tests keep passing. The diagnostic data point: turns 333, 339, 343 are semantically the same edit applied repeatedly because the agent has lost track of what it already did. The good news: this is a one-config fix.

Failure Type Identification

Primary: Context Bloat (90% confidence) producing secondary Loop Trap (10%).

This is not pure loop-trap because the tool-calls are not bit-for-bit identical — the agent is making subtly different edits each time. It is not pure tool-misselection because the tools chosen (Read, Edit, Bash) are correct for the task. It is not hallucination spiral because the files referenced exist.

The signature is: at turn 340-345, the agent's context contains so much prior reasoning + file content from the previous 6 router conversions that effective decision-making has degraded. The model is essentially making each editing decision in isolation from its own prior turns within the same file.

Diagnostic Evidence

1. Turn 333, 339, 343 are semantically equivalent edits. The model is repeatedly arriving at the same conclusion about what edit to make. This means the agent is NOT carrying the memory that it already made the edit.

2. Tests are passing on each iteration. This rules out hallucination (the edits are syntactically valid) and confirms the agent is in a doom-loop where each individual decision is correct but the global state is wrong.

3. Slow inference is a context-size signature. Inference time scales roughly linearly with context tokens. By turn 340 with 200K context window mostly full, each turn takes 30-60s vs the 5-10s typical at turn 50.

4. Re-reading the same files repeatedly (turns 335, 337, 341, 343 all Read /src/routes/orders.js) is classic context-bloat behavior — the agent does not trust its in-context memory and refetches.

5. Worked for 6 prior files rules out task-level configuration issues.

6. No subagent usage + 200K default context = the entire 6-file refactor + reasoning chain is in a single context. By turn 340, that's roughly 150K tokens of accumulated context, dominated by stale reasoning.

Root Cause

Claude Code's default behavior puts ALL prior turns of the same task in a single rolling context. Refactoring 12 files sequentially produces a context that grows linearly with files done. By the 7th file, the context contains:

  • 6 prior file conversions (each ~5K tokens of file content + reasoning)
  • The system prompt + tool descriptions (~4K tokens)
  • The task instructions (~1K tokens)
  • Current file work (~5K tokens of reasoning so far on orders.js)

Total: ~140-160K tokens. Within 200K window, but past the point where the model effectively uses early context. The model has functionally forgotten that it edited turns 333 and 339; each new turn rederives "this file needs editing."

This is not Claude failing. This is a context-management failure that Claude Code 2.1 makes easy to fall into.

The Specific Fix

Immediate (terminate current session):

1. Kill the current Claude Code session. Don't try to recover; the context is poisoned. `Ctrl+C` then `/exit`.

2. Manually verify orders.js is committed. Run `git status` and `git log --oneline -5` in the repo. If the latest commit message includes orders.js conversion, your work is preserved. If not, check the working tree: `git diff src/routes/orders.js` to see if the conversion is already done locally — if yes, manually commit it.

Restart with proper context architecture:

3. Use the subagent pattern, not a single rolling context. In Claude Code 2.1, this means:

```

Spawn a sub-agent for each remaining router file. Sub-agent's task: 'Convert /src/routes/[file].js to TypeScript. Run npm run test:routes after. Commit if green.' Sub-agent gets fresh context per file.

```

This is the correct pattern for 12-file refactors.

4. Reduce the per-task context window by passing `--max-context 50000` (or your platform equivalent). Forces the agent to summarize/dump rather than accumulate.

5. Use `/clear` between files if you stay in single-context mode. Claude Code's `/clear` resets context while keeping current task. Less ideal than subagents but works.

6. Add explicit "already done" tracking. In your task instructions, add: "Before starting work on a file, run `git log --oneline -- [filename]` to check if you've already converted it." This makes the agent verify before duplicating work.

For the remaining 6 files specifically:

7. List the remaining files: `ls src/routes/ | grep -v -E 'orders|users|products|payments|auth|webhooks|settings'` (or whichever 6 are done). Process each in a fresh subagent.

What NOT to Do

1. Do NOT increase the context window to 1M tokens. Same problem at higher scale. Context bloat is not solved by more context.

2. Do NOT add "remember to check what you already did" to the prompt. The model already has access to its own context; it's not failing to look, it's failing to weight earlier turns. More instructions don't fix this.

3. Do NOT switch to GPT-5. Same architectural issue. The fix is in context management, not model selection.

Prevention Checklist

1. For multi-file refactors of >5 files: always use subagents. One file = one subagent = one context.

2. For long-running tasks (>2 hours expected): plan for 50-turn budgets per subagent. Force summarization before exceeding.

3. Configure max-context limits explicitly in long-running agent sessions. Default 200K is for short tasks.

4. Use `/clear` proactively in single-context sessions every 30-50 turns when starting new work units.

5. Track external state via git/file system, not in-context memory. Have the agent verify state by querying tools, not by remembering.

6. Set up checkpoint commits in long tasks. Commit after each successful unit so you can resume cleanly if context goes bad.

7. Monitor inference time per turn. A turn taking 3× longer than baseline is a context-bloat warning sign.

Verification Test

After applying the fix:

1. Restart with subagent pattern for remaining files.

2. Each subagent task should complete in 15-30 turns (vs your current 50+ on file 7).

3. Inference time per turn should stay under 10 seconds.

4. No re-reads of the same file in a tight window (>2 reads of same file in 5 turns is a signal of returning context-bloat).

5. Final commit pattern: 12 sequential commits with file-conversion messages, no "WIP" or "fix" commits in between.

If This Recurs in 7 Days

If you keep hitting this on similar tasks despite subagent usage, the issue is upstream:

  • Task scope is too broad. "Refactor codebase" is not a 1-agent task. Decompose into module-by-module subagents with clear hand-off.
  • Tool descriptions are prompt-bloating. Audit your tool descriptions; trim verbose ones. Each tool's description is in every turn's context.
  • You need a different agent platform for this scale. Claude Code excels at single-task refactors of <5 files. For 50+ file refactors, consider workflow tools (Linear AI, GitHub Copilot Workspace) that are designed for multi-step planning with persistence.

Key Takeaways

  • Your failure is context bloat, not loop logic. Claude Code accumulated 140K+ tokens of prior reasoning by turn 340. The model lost effective access to early context, started rederiving decisions from scratch, and hit a duplicate-edit pattern.
  • The fix is the subagent pattern, not bigger context. One file per subagent. Fresh context. Faster, cheaper, more reliable.
  • Verify orders.js state in git before any rerun. It may already be done; the loop is repeating successful work. git status and git log -- src/routes/orders.js will tell you.
  • For multi-file refactors generally: subagents are the default. Single-context refactors of >5 files are an anti-pattern in 2026 Claude Code workflows.
  • Add inference-time monitoring to your agent runbook. A turn taking 3× baseline is the early-warning signal of returning context bloat.

Common use cases

  • Engineer building a Claude Code or Cursor agent that hangs midway through a task
  • Developer with an MCP-based agent that keeps calling the wrong tool
  • Solo operator running n8n/Zapier+AI workflows that silently fail
  • Agent-platform user whose loops never terminate or terminate too early
  • Team running Devin/Manus on production tasks and needing post-mortem on failures
  • Builder who needs to harden an agent before deploying to customers

Best AI model for this

Claude Opus 4. Agent debugging requires multi-step systems-level reasoning across logs and structured outputs — Claude's long-context analysis is uniquely suited. ChatGPT GPT-5 is second-best.

Pro tips

  • Capture the agent's TOOL-CALL log, not just the final output. 90% of agent failures are visible in the tool-call sequence; the final output only tells you something went wrong.
  • Loop traps almost always stem from missing termination conditions, not from looping logic. Agents in 2026 default to retry; you have to explicitly say when to stop.
  • Context bloat is the silent killer. By turn 30+, the agent's context is 80% prior reasoning that no longer matters. Most platforms now offer context summarization — use it.
  • Tool misselection is a prompt-engineering problem, not a model problem. If the agent keeps choosing Bash when it should use Read, your tool descriptions are the bug.
  • Hallucination spirals (model invents a file path, then keeps trying to use it) require explicit verification gates. Build them into the agent flow, not into recovery.
  • Rate-limit cascades compound. One rate limit triggers retry, which triggers more rate limits, which the agent treats as a different failure. Use exponential backoff, not retry-immediately.
  • When in doubt, reduce the agent's permission scope. A broken agent with read-only access is recoverable; a broken agent with write access is a catastrophe.

Customization tips

  • Capture FULL tool-call logs, not summaries. The diagnostic value is in specific call patterns, not in narrative descriptions of what the agent did.
  • Note the platform version exactly (Claude Code 2.1 vs 2.0 have different default behaviors). Failure modes shift across platform versions.
  • If the agent has multiple tool integrations (MCP servers, custom tools), list them all in <environment>. Tool-misselection diagnoses depend on tool inventory.
  • Run the diagnoser BEFORE rebooting the agent if possible. The active session's context state is sometimes recoverable.
  • Save the diagnosis output as a runbook entry. Most teams accumulate 5-15 diagnoses before the patterns become memorized.
  • For production agents serving customers: pair with the Production Agent Mode variant — adds blast-radius assessment that changes recovery decisions.
  • If you keep getting the same failure type across different agents, the issue is your agent ARCHITECTURE, not any specific instance. Run a meta-diagnosis.

Variants

Claude Code Mode

Specifically for Claude Code (Anthropic CLI agent) failure modes — sub-agents, context, todo-list lifecycle.

MCP Agent Mode

For agents using Model Context Protocol — tool registration issues, server permissions, version mismatches.

Browser Agent Mode

For browser-based agents (Manus, Computer Use) — page-state issues, timing, viewport, screen-capture failures.

Production Agent Mode

For agents running on customer-facing infrastructure — adds blast-radius assessment and rollback planning.

Frequently asked questions

How do I use the AI Agent Failure Diagnoser prompt?

Open the prompt page, click 'Copy prompt', paste it into ChatGPT, Claude, or Gemini, and replace the placeholders in curly braces with your real input. The prompt is also launchable directly in each model with one click.

Which AI model works best with AI Agent Failure Diagnoser?

Claude Opus 4. Agent debugging requires multi-step systems-level reasoning across logs and structured outputs — Claude's long-context analysis is uniquely suited. ChatGPT GPT-5 is second-best.

Can I customize the AI Agent Failure Diagnoser prompt for my use case?

Yes — every Promptolis Original is designed to be customized. Key levers: Capture the agent's TOOL-CALL log, not just the final output. 90% of agent failures are visible in the tool-call sequence; the final output only tells you something went wrong.; Loop traps almost always stem from missing termination conditions, not from looping logic. Agents in 2026 default to retry; you have to explicitly say when to stop.

Explore more Originals

Hand-crafted 2026-grade prompts that actually change how you work.

← All Promptolis Originals