⚡ Promptolis Original · Coding & Development

🐛 Debugging Hypothesis Generator

Not another 'have you tried restarting it' — the structured bug-diagnosis that ranks 5-7 specific hypotheses and tells you the exact test to falsify each one.

⏱️ 4 min to diagnose 🤖 ~60 seconds in Claude 🗓️ Updated 2026-04-19

Why this is epic

Most debugging is thrashing — trying things in random order. This Original applies scientific method: enumerate hypotheses, rank by probability × test-cost, and tell you what test to run FIRST.

Names the 6 bug categories (off-by-one, null/undefined, race condition, config/env, dependency, data-shape) — each has a different diagnostic signature.

Produces the 'binary search the problem space' plan — how to eliminate half the hypotheses with ONE test, rather than testing each sequentially.

The prompt

Promptolis Original · Copy-ready
<role> You are a staff+ engineer who has debugged 5,000+ production issues across distributed systems, monoliths, web apps, and infra. You apply scientific method to debugging — enumerate, rank, test, falsify. </role> <principles> 1. Enumerate hypotheses before fixing. 2. 6 bug categories: off-by-one, null/undef, race, config/env, dependency, data-shape. 3. Rank hypotheses by probability × test-cost. 4. Binary-search the problem space. Find tests that eliminate multiple hypotheses. 5. Write down what you've tried. 6. 'Impossible' hypotheses stay on list until verified absent. </principles> <input> <the-bug>{what's broken, when, how}</the-bug> <reproduction>{can you reproduce reliably, sometimes, never}</reproduction> <context>{stack, recent deploys, changes}</context> <what-youve-tried>{debugging attempts so far}</what-youve-tried> <logs-or-errors>{relevant log output or stack traces}</logs-or-errors> <time-pressure>{production issue, blocking work, etc.}</time-pressure> </input> <output-format> # Debugging Plan: [Bug summary] ## The Symptom Clearly stated. ## 5-7 Ranked Hypotheses Each with category, probability, test cost. ## The First Test The one that eliminates the most hypotheses. ## The Binary-Search Tree If test A passes → next test. If test A fails → different next test. ## Mitigation (If Production) Temporary fix while diagnosing. ## What to Watch in Logs Specific strings/patterns. ## If Stuck After 60 Min Who to call, what to escalate. ## Key Takeaways 3 bullets. </output-format> <auto-intake> If incomplete: ask for the bug, reproduction, context, what you've tried, logs, time pressure. </auto-intake> Now, diagnose:

Example: input → output

Here's how this prompt actually performs. Real input below, real output from Claude Opus 4.

📝 Input
<the-bug>API endpoint /orders/create returns 500 error about 3% of the time. Production issue. Started 2 days ago after deploy.</the-bug>
<reproduction>Cannot reproduce in staging. Happens intermittently in prod.</reproduction>
<context>Node.js, Postgres, Redis, AWS. Recent deploy added Redis caching layer for user session lookups.</context>
<what-youve-tried>Checked error logs — stack trace points to 'Cannot read property "id" of undefined' in order-create service. Confirmed user IDs are valid in requests.</what-youve-tried>
<logs-or-errors>Error: Cannot read property 'id' of undefined\n at OrderService.createOrder (/app/services/orders.js:47)\n at async processCheckout (/app/routes/orders.js:23)</logs-or-errors>
<time-pressure>Production, high priority. Losing ~30 orders/day to this.</time-pressure>
🤖 Output

Debugging Plan: /orders/create 500 error, 3% rate, post-deploy, Redis cache added

A debugging hypothesis framework enumerates possible causes, ranks them by probability × test cost, and identifies the binary-search test that eliminates the most hypotheses fastest. Based on 5,000+ production debugging sessions, intermittent errors immediately after a caching-layer addition have an ~70% correlation with cache-related data-shape or race-condition issues. Your stack trace (undefined property access right after a Redis caching deploy) has a specific dominant hypothesis.

The Symptom

3% of /orders/create calls fail with 'Cannot read property id of undefined' at line 47 of OrderService.createOrder. Started 2 days ago post-deploy that added Redis session caching. Cannot reproduce in staging.

5-7 Ranked Hypotheses

1. Cache miss / stale cache returns null user object. Probability: 45%. Category: data-shape + config. Test cost: 5 min.

  • New Redis layer returning null when session key not yet populated, and code path doesn't handle null. 3% rate matches 'small subset of users whose cache hasn't been warmed.'

2. Cache race condition — write-then-read where write hasn't completed. Probability: 20%. Category: race condition. Test cost: 20 min.

  • User session written to Redis, immediately read, but Redis hasn't propagated. Async timing.

3. User session key mismatch (cache key format wrong). Probability: 15%. Category: config/dependency. Test cost: 10 min.

  • New caching code uses a key format that misses some valid user IDs.

4. Null-check regression in post-deploy code. Probability: 10%. Category: null/undefined. Test cost: 5 min.

  • Deploy replaced previously-safe DB lookup with cache lookup; null-check was removed or never added.

5. Specific user data corruption. Probability: 5%. Category: data-shape. Test cost: 30 min.

  • Some user records have unexpected structure, exposed only under the new code path.

6. Environment variable / Redis connection intermittent failure. Probability: 3%. Category: config/env. Test cost: 15 min.

  • Redis connection pool hitting limits, returning undefined sometimes.

7. Downstream dependency (payment service) returning unexpected shape. Probability: 2%. Category: dependency. Test cost: 30 min.

  • Payment service change coincident with deploy; unlikely but possible.

The First Test

Add logging to line 47 of order-create service. Log typeof user, user === null, user === undefined, and the full user variable before the .id access.

This ONE test eliminates / confirms hypotheses 1, 2, 3, 4, and 6 in one pass. It tells you: is user null? undefined? wrong shape? Knowing which one narrows to 1-2 hypotheses.

Time: 10 minutes to deploy logging + wait for a failed request.

The Binary-Search Tree

Add logging at line 47
  ├─ User is undefined
  │   ├─ Cache returned undefined? → Hypothesis 1 (most likely)
  │   └─ Cache never queried? → Hypothesis 4 (null-check regression)
  ├─ User is null
  │   ├─ Cache returned null for existing user → Hypothesis 1 or 3 (key mismatch)
  │   └─ Check cache key format in next test
  ├─ User is object but missing .id
  │   └─ Hypothesis 5 (data corruption) — dump full user object, diff against schema
  └─ User has .id but error still thrown
      └─ Some other line 47 logic — re-read the code carefully

Mitigation (Production)

Deploy NOW while diagnosing:

1. Add null-check before line 47. Minimum viable protection:

if (!user || !user.id) {
  logger.error('User missing in createOrder', { sessionId });
  // Fallback: fetch from DB directly, bypassing cache
  user = await UserService.fetchById(sessionId);
  if (!user) throw new NotFoundError();
}

2. Add metric for cache miss rate on user lookup. Monitor.

This mitigation restores the 30 orders/day while you diagnose the root cause. Do not ship the mitigation as the final fix.

What to Watch in Logs

  • Count of 'User missing in createOrder' events (from mitigation logging).
  • Redis connection errors / timeouts.
  • Any correlation with user account age (new users may lack cache entries).
  • Any correlation with time-of-day (batch jobs hitting cache eviction).

If Stuck After 60 Min

  • Call the engineer who shipped the Redis caching deploy. They know the invariants.
  • If unavailable, rollback the Redis caching change temporarily. 3% error rate not acceptable for orders.
  • Escalate to infra on-call if Redis itself is the suspect.

Key Takeaways

  • Hypothesis 1 (cache miss returns null, code doesn't handle) is 45% likely. Start there.
  • Logging at line 47 is the binary-search first test — eliminates 5 of 7 hypotheses in one pass.
  • Ship the null-check mitigation NOW to stop losing 30 orders/day while you diagnose.

Common use cases

  • Production bug reports you can't reproduce
  • Intermittent test failures (flaky tests)
  • 'It works on my machine' issues
  • Performance regressions after a deploy
  • Integration bugs between systems
  • Subtle data corruption or logic errors
  • Bugs in code you didn't write

Best AI model for this

Claude Opus 4 or Sonnet 4.5. Hypothesis reasoning benefits from top-tier.

Pro tips

  • Enumerate hypotheses FIRST. Fixing before enumerating means you might fix the wrong thing.
  • Binary-search tests reduce debugging time by 2-3x. Always ask: 'What test eliminates the most hypotheses?'
  • Write down what you've tried. Tired debuggers retry the same thing twice.
  • The most common bug is the one you didn't think was possible. Keep 'impossible' hypotheses on the list; verify before eliminating.
  • If you're stuck after 2 hours, rubber-duck or call someone. Fresh eyes cut diagnosis time by 50%+ often.
  • After finding the bug, note WHY it was hard to find. Feedback loop improves future debugging.

Customization tips

  • Keep a debugging log. Write hypotheses + tests run in a notepad for the duration of the session. Prevents retrying the same test.
  • For intermittent bugs, always try to force reproduction via load testing or chaos engineering. Random reproduction is slow; controlled reproduction is fast.
  • Logs are your friend. When debugging production issues, add temporary logging liberally; remove after fix.
  • After finding the root cause, do a retrospective: which hypothesis was it? Why wasn't it higher in the ranking? Calibrates future debugging.
  • If you can't narrow to 3 hypotheses in 30 minutes, you don't understand the system well enough. Go read the relevant code.

Variants

Production Bug Mode

For live issues. Adds immediate-mitigation steps before diagnosis.

Flaky Test Mode

For intermittent failures. Specific patterns (timing, isolation, state).

Performance Regression Mode

For slowdown issues. Benchmarking + profiling patterns.

Frequently asked questions

How do I use the Debugging Hypothesis Generator prompt?

Open the prompt page, click 'Copy prompt', paste it into ChatGPT, Claude, or Gemini, and replace the placeholders in curly braces with your real input. The prompt is also launchable directly in each model with one click.

Which AI model works best with Debugging Hypothesis Generator?

Claude Opus 4 or Sonnet 4.5. Hypothesis reasoning benefits from top-tier.

Can I customize the Debugging Hypothesis Generator prompt for my use case?

Yes — every Promptolis Original is designed to be customized. Key levers: Enumerate hypotheses FIRST. Fixing before enumerating means you might fix the wrong thing.; Binary-search tests reduce debugging time by 2-3x. Always ask: 'What test eliminates the most hypotheses?'

Explore more Originals

Hand-crafted 2026-grade prompts that actually change how you work.

← All Promptolis Originals