⚡ Promptolis Original · AI Agents & Automation

🐛 Agent Prompt Debugger — Diagnose Why Your Claude Agent Is Drifting

The structured debugging protocol for when your Claude Agent SDK agent keeps failing, drifting, or doing the wrong thing — covering the 8 diagnostic checks, the 5 drift patterns, and the specific prompt-structure fixes that restore reliable agent behavior.

⏱️ 10 min to run full diagnostic 🤖 ~2 min in Claude 🗓️ Updated 2026-04-20

Why this is epic

When an agent drifts, most developers rewrite prompts aimlessly — adding rules, tweaking wording, trying temperature changes. Hours later, it's worse. This Original runs a structured 8-check diagnostic (system prompt clarity, tool description match, examples distribution, context window usage, stop-conditions, termination logic, state management, reward-hacking-shape) that finds the ACTUAL cause in minutes.

Names the 5 drift patterns (goal-loop / tool-confusion / context-degradation / instruction-dilution / reward-hack) with the specific symptoms and specific fixes for each. Most drift is one of these patterns — naming it removes 80% of the guesswork.

Produces the before/after prompt structure + the specific reproducer test case + the monitoring setup to confirm the fix. Based on debugging workflows from teams running agents in production at scale (Cognition, Cursor, Replit, Anthropic internal teams).

The prompt

Promptolis Original · Copy-ready
<role> You are an agent engineering debugger with deep experience debugging Claude Agent SDK agents, OpenAI Assistants, and LangGraph agents in production. You've debugged 100+ production agents across tech companies — diagnosing failures, naming drift patterns, and prescribing specific fixes. You draw on empirical observations about LLM behavior + the Claude Agent SDK's specific architecture + prompt engineering research (Anthropic cookbook, Cursor's team writeups, Cognition's Devin papers). You are direct. You will name when an agent is under-constrained, over-constrained, has ambiguous tool descriptions, or when the developer is chasing a symptom instead of the cause. </role> <principles> 1. Reproduce before theorizing. Clean reproducer = 80% of the debug. 2. Read the full trace. Drift starts 4-8 turns before visible failure. 3. 5 drift patterns: goal-loop, tool-confusion, context-degradation, instruction-dilution, reward-hack. 4. 8 diagnostic checks: system prompt clarity, tool description match, examples, context window usage, stop conditions, termination logic, state management, reward-hacking shape. 5. Tool-use confusion is #1. Check tool descriptions + overlap. 6. Concrete-first system prompts, not abstract rule lists. 7. Long context degrades non-linearly. Manage aggressively. 8. Temperature doesn't fix prompt problems. Fix prompts. </principles> <input> <agent-purpose>{what the agent is supposed to do}</agent-purpose> <observed-behavior>{what it's actually doing — drift, loop, wrong tool, wrong answer}</observed-behavior> <reproducer-available>{do you have a consistent reproducer}</reproducer-available> <system-prompt>{current system prompt text}</system-prompt> <tools>{tool names + descriptions + schemas}</tools> <failure-trace>{paste a failing trace if available}</failure-trace> <what-youve-tried>{previous fix attempts and results}</what-youve-tried> <environment>{Claude Code / Claude Agent SDK / API / other}</environment> </input> <output-format> # Agent Debug: [Problem summary] ## Reproducer Status Do we have a clean reproducer? If not, build one first. ## Drift Pattern Diagnosis Which of the 5 patterns + specific evidence. ## The 8 Diagnostic Checks System prompt / tools / examples / context / stop / termination / state / reward-hacking — pass/fail each. ## Root Cause Analysis What is actually causing the drift (not symptoms). ## The Fix Specific prompt/tool/architecture changes. ## Before/After Examples Specific text changes with explanations. ## Regression Test Case Reproducer that must pass after the fix. ## Monitoring Plan How to confirm the fix holds. ## Common Misfires Things that look like they'd help but won't. ## Key Takeaways 5 bullets. </output-format> <auto-intake> If input incomplete: ask for agent purpose, observed behavior, reproducer, system prompt, tools, failure trace, what's been tried, environment. </auto-intake> Now, debug:

Example: input → output

Here's how this prompt actually performs. Real input below, real output from Claude Opus 4.

📝 Input
<agent-purpose>Customer support agent for our SaaS product (project management tool). Answers questions, looks up user accounts, creates support tickets, escalates to humans when needed. Deployed via Claude Agent SDK in our web app chat widget.</agent-purpose>
<observed-behavior>Agent frequently creates support tickets WITHOUT FIRST trying to answer the question. User says 'how do I change my email?' — agent creates a support ticket instead of answering. Happens ~40% of the time on simple questions. Also sometimes creates DUPLICATE tickets in the same conversation.</observed-behavior>
<reproducer-available>Yes — prompt it with 'how do I change my email address' and it creates a ticket ~40% of the time despite the answer being in our docs.</reproducer-available>
<system-prompt>You are a helpful customer support agent for ProjectHub. Answer questions accurately. If you cannot answer, create a support ticket. Be friendly and professional. Use the available tools when appropriate.</system-prompt>
<tools>Tools: (1) search_docs(query) — searches our documentation. (2) get_user_account(user_id) — fetches user's account info. (3) create_ticket(subject, description, priority) — creates a support ticket. (4) escalate_to_human(reason) — hands off to human agent.</tools>
<failure-trace>User: 'how do I change my email address?' Agent: 'I'll create a support ticket for that.' [calls create_ticket] Agent: 'I've created ticket #4892 for you. A human agent will respond within 24 hours.'</failure-trace>
<what-youve-tried>(1) Added 'try to answer first using docs' — didn't help much. (2) Increased search_docs prominence in the tool list — didn't help. (3) Switched from Sonnet to Opus — slightly better but still happening.</what-youve-tried>
<environment>Claude Agent SDK via our Node.js backend, using @anthropic-ai/agent-sdk</environment>
🤖 Output

Agent Debug: Support agent creating tickets instead of answering

An agent debugging protocol runs 8 structured diagnostic checks (system prompt clarity / tool description match / examples / context / stop conditions / termination / state / reward-hacking shape) to identify which of the 5 drift patterns (goal-loop / tool-confusion / context-degradation / instruction-dilution / reward-hack) your agent is exhibiting. Based on 100+ production agent debugs, the most common pattern — by a wide margin — is tool-confusion caused by ambiguous tool descriptions. Your reproducer (40% ticket-creation rate on simple doc-answerable questions) is a textbook case of this pattern: create_ticket has no 'when NOT to use me' clause, while search_docs has no 'use me FIRST for any question' clause. This debug prescribes specific tool-description rewrites + 1 system-prompt structural fix that should drop ticket-creation rate to <5% on doc-answerable questions.

Reproducer Status

Good: you have a reproducer. 'how do I change my email address' → creates ticket 40% of time.

Before fixing anything, nail down the reproducer more precisely:

1. Run the prompt 20 times with `temperature=0`. Count ticket-creation rate.

2. Run 20 times with `temperature=0.7`. Count again.

3. Add 5 variants of the question ('how to change email', 'email change', 'update my email address') and count.

This gives you baseline metrics. After the fix, re-run — expect >90% drop in ticket-creation rate.

Drift Pattern Diagnosis

Pattern: TOOL-CONFUSION

Evidence:

  • Agent consistently picks create_ticket when it should pick search_docs
  • The wrong tool does a LOT of work (ticket creation has side effects, costs, user UX impact)
  • Tool descriptions don't disambiguate WHEN to use each
  • Agent behavior suggests it's treating 'create_ticket' as 'handle this request' rather than 'escalate unresolvable request'

Secondary pattern: Minor instruction-dilution (system prompt is vague about sequence of operations).

Not these patterns:

  • Not goal-loop (agent doesn't loop, it fails on turn 1)
  • Not context-degradation (short-conversation failure)
  • Not reward-hacking (agent isn't 'gaming' — it's confused about tools)

The 8 Diagnostic Checks

Check 1: System Prompt Clarity — FAIL

Current: 'You are a helpful customer support agent for ProjectHub. Answer questions accurately. If you cannot answer, create a support ticket.'

Problems:

  • 'If you cannot answer' — undefined condition. Claude may interpret 'I don't have immediate knowledge' as 'cannot answer' rather than 'should search docs first.'
  • No explicit sequence ('first search, then answer, ticket only as last resort')
  • 'Use the available tools when appropriate' — too vague. Doesn't say which order.
Check 2: Tool Description Match — FAIL (Root Cause)

search_docs(query) current description is presumably empty or minimal.

create_ticket(subject, description, priority) current description is presumably empty or minimal.

The tool descriptions are NOT disambiguating. Claude picks create_ticket because it's listed as available and the system prompt doesn't force search-first.

Check 3: Examples Distribution — N/A

No few-shot examples provided in system prompt. This is actually OK for Opus/Sonnet with clear tool descriptions — but your tool descriptions aren't clear.

Check 4: Context Window Usage — PASS

Short conversations. No context degradation issue.

Check 5: Stop Conditions — PASS

Agent terminates correctly after tool calls. Not a termination bug.

Check 6: Termination Logic — PASS

Single-turn failure — not a termination issue.

Check 7: State Management — FAIL (Secondary)

No tracking of 'have I already created a ticket for this topic' = why you're seeing DUPLICATE tickets in same conversation. Agent has no memory that it just created ticket #4892 on the previous turn.

Check 8: Reward-Hacking Shape — N/A

No reward-hacking here. Agent is following what it thinks are instructions.

Root Cause Analysis

Root cause: Tool descriptions don't enforce sequence.

Claude, when faced with a user question, evaluates which tool best matches. With vague descriptions, create_ticket looks like a reasonable tool for 'handling a user request.' The system prompt's 'if you cannot answer' is ambiguous enough that Claude sometimes interprets 'I should search docs first, but ticket is also valid' as 'ticket is fine.'

Secondary cause: No duplicate-prevention state = same-conversation duplicates.

The Fix

Fix 1: Rewrite system prompt with explicit sequence
You are a customer support agent for ProjectHub (a project management SaaS).

Your job has this strict sequence:
1. FIRST: search the documentation via `search_docs` to find an answer.
2. IF docs have the answer: answer the user directly, citing the relevant doc section.
3. IF docs don't have the answer AND the question requires account-specific info: use `get_user_account` to fetch it.
4. IF docs don't have the answer AND you can't resolve it yourself: create a support ticket via `create_ticket`.
5. IF the issue is urgent or the user is frustrated: escalate to a human via `escalate_to_human`.

You MUST call `search_docs` before creating any ticket. If the question is 'how do I do X' or 'where is Y' — docs almost certainly have the answer.

If you have already created a ticket in this conversation, do NOT create another one. Refer the user to the existing ticket instead.

Be friendly but concise. Don't apologize unnecessarily. Answer the question.
Fix 2: Rewrite tool descriptions
{
  name: 'search_docs',
  description: 'Search the ProjectHub documentation for an answer to the user\'s question. ALWAYS call this FIRST for any question that starts with "how do I", "where is", "can I", "does ProjectHub support", or similar doc-answerable patterns. Returns relevant doc sections with URLs to cite.',
  // ...
},
{
  name: 'get_user_account',
  description: 'Fetch the authenticated user\'s account information (plan, billing, settings). Only call this AFTER `search_docs` if the question requires account-specific info (e.g., "what plan am I on?", "when did I sign up?").',
  // ...
},
{
  name: 'create_ticket',
  description: 'Create a support ticket for a human agent to handle. ONLY call this as a LAST RESORT after `search_docs` failed to find an answer AND you cannot resolve the issue yourself. DO NOT create a ticket for questions that have doc-answerable patterns like "how do I" or "where is". DO NOT create multiple tickets in the same conversation — if a ticket exists, refer to it.',
  // ...
},
{
  name: 'escalate_to_human',
  description: 'Hand off to a live human support agent immediately. Use when: user explicitly asks for a human, user is clearly frustrated/angry, or the issue is urgent/security-related. Different from create_ticket — this is real-time handoff, not async ticket creation.',
  // ...
}
Fix 3: Add conversation state for duplicate prevention

Pass a conversation_state field through tool calls that tracks ticket_created_this_conversation: boolean. Update after first create_ticket call. System prompt already addresses this, but belt + suspenders.

Before/After Examples

Example 1: User asks 'how do I change my email?'

Before:

  • 40% of time: agent calls create_ticket (wrong)
  • 60% of time: agent calls search_docs then answers (correct)

After:

  • Agent reads system prompt: 'MUST call search_docs before creating ticket'
  • Agent reads create_ticket description: 'DO NOT create ticket for "how do I" questions'
  • Agent calls search_docs('how to change email address')
  • Docs return: 'Settings → Account → Email → Edit → Save'
  • Agent answers user directly, citing doc
  • Ticket-creation rate on this question: expected <5%

Example 2: User asks 'my account is locked, help'

Before:

  • Agent might create ticket directly

After:

  • Agent calls search_docs('account locked')
  • If docs have self-service unlock: agent provides it
  • If docs require human action: agent calls escalate_to_human (real-time) OR create_ticket (async, if non-urgent)
  • Correct path selected based on issue urgency

Regression Test Case

Add these tests to your agent test suite:

describe('Agent: tool selection', () => {
  test('Simple doc-answerable question uses search_docs, not create_ticket', async () => {
    const result = await runAgent('how do I change my email address?');
    expect(result.toolCalls[0].name).toBe('search_docs');
    expect(result.toolCalls.find(t => t.name === 'create_ticket')).toBeUndefined();
  });
  
  test('Account-specific question uses get_user_account after search_docs', async () => {
    const result = await runAgent('what plan am I on?');
    const toolSequence = result.toolCalls.map(t => t.name);
    expect(toolSequence).toEqual(expect.arrayContaining(['search_docs', 'get_user_account']));
  });
  
  test('No duplicate tickets in same conversation', async () => {
    const conversation = await startConversation();
    await conversation.send('I have a bug to report'); // might create ticket
    await conversation.send('another issue'); // must NOT create second ticket
    const allTicketCalls = conversation.toolCalls.filter(t => t.name === 'create_ticket');
    expect(allTicketCalls.length).toBeLessThanOrEqual(1);
  });
  
  test('Urgent issue triggers escalate_to_human not create_ticket', async () => {
    const result = await runAgent('urgent: my data is deleted, I need help NOW');
    expect(result.toolCalls.find(t => t.name === 'escalate_to_human')).toBeDefined();
  });
});

Success criteria: All 4 tests pass reliably (>95% of runs) after fix.

Monitoring Plan

In production after deploying fix:

1. Metric: `support_agent.ticket_created_per_conversation`

- Before: ~0.6 (high — many conversations create tickets unnecessarily)

- Target: ~0.15 (only for actually-unresolvable issues)

2. Metric: `support_agent.doc_searched_before_ticket_ratio`

- Should be ~100%. If not, tool sequence fix failed.

3. Metric: `support_agent.duplicate_tickets_per_conversation`

- Target: 0 or near-zero.

4. Weekly review: sample 10 conversations with ticket creation. Manually verify: was ticket creation correct (docs didn't have the answer)?

Alert: if ticket-creation rate jumps back above 0.3/conversation, investigate immediately (usually regression from a prompt change elsewhere).

Common Misfires

Things that LOOK like fixes but won't actually help:

1. Adding more 'be helpful' instructions. Doesn't address the sequence issue. The agent IS trying to be helpful — creating tickets is (to it) a helpful action.

2. Switching model (Sonnet → Opus). You tried this. Marginally helps but doesn't fix root cause. Opus is better at following ambiguous instructions; it's not magical.

3. Temperature=0. Makes wrong behavior MORE consistent, not less. Don't use this as a debug tool.

4. Adding a global rule 'always search docs first.' Helps somewhat but doesn't compound with specific tool descriptions. Both are needed.

5. Removing `create_ticket` from tool list entirely. Over-correction. Tickets ARE needed sometimes. Keep the tool, fix the description.

Key Takeaways

  • Drift pattern: tool-confusion. Claude picks create_ticket because tool descriptions don't disambiguate + system prompt doesn't enforce sequence.
  • Fix is 3 parts: (1) system prompt with explicit 5-step sequence, (2) tool descriptions with 'when to use me' and 'when NOT to use me', (3) conversation state to prevent duplicate tickets.
  • Expected improvement: ticket-creation rate drops from 40% to <5% on doc-answerable questions. Duplicate tickets eliminated.
  • Regression test suite: 4 test cases in Jest covering tool selection. Run on every prompt change to catch drift.
  • Monitor: ticket-created-per-conversation metric weekly. Alert if it jumps — usually signals prompt regression from unrelated change.

Common use cases

  • Agent SDK developers debugging production agents behaving unexpectedly
  • AI engineering teams investigating agent failure modes
  • Product teams diagnosing why their Claude-based feature keeps misbehaving
  • Researchers studying agent drift patterns
  • Consultants troubleshooting client agent deployments
  • QA/eval teams finding reliable reproducers for agent failures
  • Anthropic-tools users debugging orchestration issues
  • Open-source agent maintainers triaging user-reported bugs
  • Platform teams standardizing agent debugging practices across the org

Best AI model for this

Claude Opus 4 or Sonnet 4.5. Agent debugging requires reasoning about LLM behavior patterns, prompt engineering, and system design simultaneously. Top-tier reasoning matters.

Pro tips

  • Reproduce FIRST, theorize second. Most wrong-fixes happen because the developer theorizes about the problem without a clean reproducer. Get the exact inputs + system state that cause the failure, run it 3x, confirm consistency — then investigate.
  • Read the FULL trace, not just the failing turn. Agent drift usually starts 4-8 turns before the visible failure. The failure is often the SYMPTOM of bad earlier decisions. Scroll up.
  • Tool-use confusion is the #1 drift pattern. Claude chooses the wrong tool or skips tools. Usually caused by overlapping tool descriptions or missing 'when to use this' in tool docs. Check tool descriptions before prompt content.
  • The system prompt should be concrete-first, rules-second. 'You are a customer-support agent for [product]. When users ask about X, do Y.' > 'You are helpful and follow all applicable rules.' Concrete grounding > abstract instruction.
  • Long context-windows degrade agent performance non-linearly. If your agent works well in short conversations but fails in long ones, suspect context degradation — either reduce context (summarize earlier), use explicit recall ('remember that the user said X earlier'), or structured memory.
  • Reward-hacking shape: when the agent finds a 'technically correct' path that satisfies the literal instruction but violates the intent. Usually fixable by adding a 'success means [concrete criterion]' and a 'don't do X even if it seems like a shortcut' constraint.
  • Temperature 0 doesn't make agents reliable — it makes them consistently whatever they were inclined to do. If the inclination is wrong, temperature 0 just makes the wrongness reproducible. Fix the prompt, not the temperature.
  • Create a regression test suite as you fix. Every drift bug becomes a test case. After 10 fixes, you have a reliable regression suite that catches drift introduced by future prompt changes.

Customization tips

  • Keep a debug-log for every agent drift you fix: what the pattern was, what the fix was, what the monitoring showed. Over time this becomes your team's 'drift pattern library.'
  • Run your 8 diagnostic checks BEFORE deploying any new agent, not just when debugging. Prevention beats debugging. Many drifts are predictable from the initial design.
  • For agents in production, instrument EVERY tool call with trace data. When drift happens, the trace is your debugger. Flying blind = slow debugs.
  • Don't assume the problem is in your prompt. Sometimes it's in the tool schemas, sometimes in the framework (SDK bugs do exist), sometimes in a user-input anomaly. Keep a wide aperture during diagnosis.
  • When you fix a drift, SHARE the fix with your team + in public (if appropriate). Agent drift patterns are under-documented publicly. Your debug write-up helps others and your future self.

Variants

Tool-Confusion Mode

For agents choosing wrong tools or skipping tools. Focuses on tool description clarity, tool selection logic, tool parameter fidelity.

Goal-Loop Mode

For agents stuck in infinite loops or getting distracted from goals. Focuses on termination conditions, goal clarity, progress tracking.

Context-Degradation Mode

For agents that perform worse as conversation grows. Focuses on context management, summarization, structured memory.

Reward-Hacking Mode

For agents finding loopholes in instructions. Focuses on intent clarity, success criteria, constraint specificity.

Frequently asked questions

How do I use the Agent Prompt Debugger — Diagnose Why Your Claude Agent Is Drifting prompt?

Open the prompt page, click 'Copy prompt', paste it into ChatGPT, Claude, or Gemini, and replace the placeholders in curly braces with your real input. The prompt is also launchable directly in each model with one click.

Which AI model works best with Agent Prompt Debugger — Diagnose Why Your Claude Agent Is Drifting?

Claude Opus 4 or Sonnet 4.5. Agent debugging requires reasoning about LLM behavior patterns, prompt engineering, and system design simultaneously. Top-tier reasoning matters.

Can I customize the Agent Prompt Debugger — Diagnose Why Your Claude Agent Is Drifting prompt for my use case?

Yes — every Promptolis Original is designed to be customized. Key levers: Reproduce FIRST, theorize second. Most wrong-fixes happen because the developer theorizes about the problem without a clean reproducer. Get the exact inputs + system state that cause the failure, run it 3x, confirm consistency — then investigate.; Read the FULL trace, not just the failing turn. Agent drift usually starts 4-8 turns before the visible failure. The failure is often the SYMPTOM of bad earlier decisions. Scroll up.

Explore more Originals

Hand-crafted 2026-grade prompts that actually change how you work.

← All Promptolis Originals