/
DE

⚡ Promptolis Original · Coding & Development

🧵 Concurrency Bug Debugger

Diagnoses race conditions, deadlocks, and async ordering bugs from your code + logs — names the specific concurrency failure mode + provides the fix that doesn't just paper over the symptom.

⏱️ 5 min to set up 🤖 ~120 seconds in Claude 🗓️ Updated 2026-04-28

Why this is epic

Concurrency bugs are the worst kind: rarely reproducible, surface only at scale, and 'just retry' makes them worse. This Original diagnoses the specific failure mode (race condition, deadlock, lost update, write skew, async ordering) and provides the structural fix.

Outputs a diagnosis: which of the 8 concurrency failure modes this is, why specifically this one, the test that would have caught it, the fix at the right layer (database isolation level, application locking, message ordering, etc.), and what NOT to do.

Calibrated to 2026 reality: distributed systems with eventual consistency, async/await pitfalls in JS/Python, queue-worker race conditions, the rise of optimistic locking patterns over pessimistic. Honest about which fixes work at scale.

Includes the prevention checklist — the patterns + tests that prevent this class of bug from recurring. 'Just be careful' isn't a fix; structured prevention is.

The prompt

Promptolis Original · Copy-ready
<role> You are a concurrency debugging specialist with 8+ years investigating race conditions, deadlocks, and async ordering bugs in production systems. You have diagnosed 100+ concurrency incidents across DB-level races, application-layer locking issues, and distributed-system consistency problems. You are direct. You will tell a builder their 'just retry' fix is making the problem worse, that their async/await code has interleaving they don't see, or that their DB isolation level is the bug. You refuse to recommend 'add a mutex' as a generic fix — you'll specify the right concurrency primitive at the right layer. </role> <principles> 1. Eight concurrency failure modes. Name the specific one. 2. Reproduce by INCREASING parallelism, not running once. 3. Deadlock logs are diagnostic gold. Always check. 4. Optimistic locking > pessimistic for read-heavy. 5. Idempotency keys solve 'might process twice' universally. 6. Async/await doesn't equal thread-safe. Interleaving at await points. 7. Tests for concurrency need actual concurrent execution. </principles> <input> <symptom>{what's going wrong: duplicates, lost data, deadlocks, ordering issues, etc.}</symptom> <reproducibility>{intermittent / always / scales with load / specific timing}</reproducibility> <code-snippet>{paste the relevant code that handles this state}</code-snippet> <logs>{relevant log entries — especially DB errors, deadlock messages, retry patterns}</logs> <system-context>{single-instance / multi-instance / distributed / serverless}</system-context> <storage>{Postgres / MySQL / Redis / DynamoDB / etc. + isolation level if known}</storage> <concurrency-level>{requests/sec, workers/jobs in parallel, scale signals}</concurrency-level> <previous-fixes-attempted>{what you've tried that didn't work}</previous-fixes-attempted> </input> <output-format> # Concurrency Bug Diagnosis: [one-line description] ## Failure Mode Identification Which of the 8 concurrency failure modes this is. Why specifically this one. ## The Specific Race / Deadlock Step-by-step description of what happens at the timing level. ## Why It's Intermittent The timing window that has to occur. Probability estimate. ## Diagnostic Evidence The specific signals in the code/logs that confirm this diagnosis. ## The Right Fix (at the right layer) DB-level / app-level / queue-level / system-level. Why this layer. ## What NOT to Do Fixes that look right but make this failure mode worse. ## Prevention Patterns The structural patterns that prevent this class of bug. ## Test That Would Have Caught This Specific test code (concurrent execution, not single-shot). ## Verification After Fix How to confirm the fix worked. Specific tests + production signals. ## If This Pattern Appears Elsewhere Other parts of the codebase likely to have similar bugs. How to audit. ## Key Takeaways 3-5 bullets — for the team's debugging playbook. </output-format> <auto-intake> If input incomplete: ask for symptom, reproducibility, code snippet, logs, system context, storage, concurrency level, previous fixes. </auto-intake> Now, diagnose the concurrency bug:

Example: input → output

Here's how this prompt actually performs. Real input below, real output from Claude Opus 4.

📝 Input
<symptom>We're seeing occasional duplicate charges in our payment system. Customer pays once but gets charged twice. ~3-5 incidents per month. Stripe shows 2 separate charges with same idempotency key (which shouldn't happen if we used the same key, but our code generates a new key each call). The charges happen within ~50ms of each other.</symptom>
<reproducibility>Intermittent. Cannot reproduce locally. Happens at peak traffic (Friday afternoons specifically). Maybe 1 in 5K successful payments.</reproducibility>
<code-snippet>
// /api/charge.ts
async function chargeCustomer(userId: string, amount: number, productId: string) {
  const idempotencyKey = `${userId}-${productId}-${Date.now()}`;
  
  // Check if already charged in last 5 minutes (dedup attempt)
  const recent = await db.query(
    'SELECT id FROM charges WHERE user_id = $1 AND product_id = $2 AND created_at > NOW() - INTERVAL \'5 minutes\'',
    [userId, productId]
  );
  
  if (recent.rows.length > 0) {
    return { status: 'duplicate', existingChargeId: recent.rows[0].id };
  }
  
  // Process the charge
  const stripeResult = await stripe.paymentIntents.create({
    amount,
    customer: userId,
  }, { idempotencyKey });
  
  // Record it
  await db.query(
    'INSERT INTO charges (user_id, product_id, stripe_id, amount, created_at) VALUES ($1, $2, $3, $4, NOW())',
    [userId, productId, stripeResult.id, amount]
  );
  
  return { status: 'charged', chargeId: stripeResult.id };
}
</code-snippet>
<logs>
No deadlock errors. We see:
  [INFO] charge requested user=user_456 product=prod_789 amount=4999
  [INFO] charge requested user=user_456 product=prod_789 amount=4999  ← same user, ~30ms later
  [INFO] stripe charge created stripe_id=pi_abc123 user=user_456
  [INFO] stripe charge created stripe_id=pi_xyz789 user=user_456  ← second charge
  [INFO] charge inserted db_id=12345 stripe_id=pi_abc123
  [INFO] charge inserted db_id=12346 stripe_id=pi_xyz789
</logs>
<system-context>Multi-instance: we run 4 API pods on Kubernetes behind a load balancer. Postgres single primary.</system-context>
<storage>Postgres 14, default Read Committed isolation. We have an index on (user_id, product_id, created_at).</storage>
<concurrency-level>~200 req/sec average, ~800 peak. Most requests are NOT charges; charges peak ~50/sec.</concurrency-level>
<previous-fixes-attempted>Added the SELECT-recent-charges check 6 months ago. It reduced incidents but didn't eliminate. We thought maybe network retries; we added timeout extensions. Didn't help.</previous-fixes-attempted>
🤖 Output

Concurrency Bug Diagnosis: TOCTOU race in dedup check + duplicate Stripe charges

Failure Mode Identification

This is a classic TOCTOU (Time-Of-Check, Time-Of-Use) race condition with check-then-act semantics in a multi-instance environment.

Not the other 7 failure modes:

  • Not a lost update (data isn't being overwritten)
  • Not a write skew (single-row writes, no cross-row constraints)
  • Not a deadlock (no DB error logs show one)
  • Not async ordering (the bug is database-level, not in async code)
  • Not eventual consistency (single Postgres primary)
  • Not phantom read (Read Committed wouldn't help here anyway)
  • Not message ordering (this isn't queued)

The Specific Race / Deadlock

Two concurrent HTTP requests from same user trigger same flow:

  Time T+0ms:  Request A arrives at Pod 1. Customer clicked 'Pay' button.
  Time T+5ms:  Request B arrives at Pod 3. Customer's network retried OR they double-clicked.
  Time T+10ms: Pod 1: SELECT recent charges → 0 rows (no charge yet)
  Time T+12ms: Pod 3: SELECT recent charges → 0 rows (still no charge)  ← BOTH pass dedup check
  Time T+15ms: Pod 1: stripe.paymentIntents.create with idempotency_key=user_456-prod_789-1745923215000
  Time T+17ms: Pod 3: stripe.paymentIntents.create with idempotency_key=user_456-prod_789-1745923215005
               ← DIFFERENT keys (Date.now() differs by ms), so Stripe doesn't dedupe
  Time T+250ms: Stripe returns success for Pod 1
  Time T+255ms: Stripe returns success for Pod 3 (separate charge!)
  Time T+260ms: Pod 1: INSERT charge
  Time T+265ms: Pod 3: INSERT charge
  
  Result: 2 Stripe charges, 2 DB rows. Customer charged twice.

The dedup SELECT happens BEFORE either INSERT exists. Both pass the check. Both proceed.

The idempotency key includes Date.now() in milliseconds — even 1ms apart, the keys differ. So Stripe correctly doesn't dedupe.

Why It's Intermittent

For the bug to occur, BOTH must be true:

  • Two requests arrive within ~250ms (the time between dedup-check and INSERT-record)
  • Requests land on different pods (so they don't share local state) — happens randomly via load balancer

At 200 req/sec, with 50/sec being charges, the probability of two concurrent charges from the same user-product within 250ms is low but non-zero. At Friday peak, slightly higher. Hence the 1-in-5K rate.

User-side triggers:

  • Double-click on submit button (your frontend should disable but might not)
  • Network retry from mobile/flaky-connection (browser auto-retries)
  • Someone clicking back+forward fast in checkout flow

Diagnostic Evidence

Line-by-line in your logs:

1. Two 'charge requested' lines for the same user+product within 30ms → confirms two concurrent requests reaching the service

2. Both produce different Stripe charge IDs → confirms both requests passed dedup check and both completed Stripe call

3. No deadlock or DB error logs → confirms it's not a DB-level race; it's check-then-act at the application level

4. The idempotency key includes `Date.now()` → confirms keys differ between concurrent requests, so Stripe-side dedup doesn't help

This pattern is the textbook 'check-then-act without locking' anti-pattern in a distributed environment.

The Right Fix (at the right layer)

The fix is at the DATABASE layer using a unique constraint + INSERT-IGNORE pattern, NOT at the application layer with locks.

Step 1: Add a deterministic idempotency key

Replace Date.now() with a deterministic value that's the same across retries:

async function chargeCustomer(
  userId: string,
  amount: number,
  productId: string,
  requestId: string  // ← REQUIRED: comes from frontend, must be unique per user-action
) {
  // Frontend generates a UUID per checkout-button-click and includes in request.
  // Network retries from same click reuse the same requestId.
  
  const idempotencyKey = `${userId}:${productId}:${requestId}`;
  // ...
}

Frontend must send a stable requestId. UUID generated when user clicks 'Pay' (NOT regenerated on retry).

Step 2: Use database unique constraint as the locking primitive
-- Migration
ALTER TABLE charges ADD COLUMN idempotency_key TEXT;
CREATE UNIQUE INDEX charges_idempotency_unique ON charges (idempotency_key);
Step 3: INSERT-then-charge pattern (not check-then-charge)
async function chargeCustomer(userId: string, amount: number, productId: string, requestId: string) {
  const idempotencyKey = `${userId}:${productId}:${requestId}`;
  
  // Step 1: Try to insert a 'pending' charge record. 
  // The unique constraint prevents two concurrent inserts.
  let chargeRow;
  try {
    const result = await db.query(
      `INSERT INTO charges (user_id, product_id, idempotency_key, status, amount, created_at)
       VALUES ($1, $2, $3, 'pending', $4, NOW())
       RETURNING id`,
      [userId, productId, idempotencyKey, amount]
    );
    chargeRow = result.rows[0];
  } catch (e) {
    // Unique constraint violation = duplicate request, race won by other
    if (e.code === '23505') {
      const existing = await db.query(
        'SELECT id, stripe_id, status FROM charges WHERE idempotency_key = $1',
        [idempotencyKey]
      );
      // If already completed, return success. If pending, wait or return 'in_progress'.
      const row = existing.rows[0];
      if (row.status === 'completed') {
        return { status: 'duplicate', existingChargeId: row.id, stripeId: row.stripe_id };
      }
      // If pending, the other request is still processing; poll or return in_progress
      return { status: 'in_progress', message: 'Charge already being processed' };
    }
    throw e;
  }
  
  // Step 2: Now we own this idempotency key. Process the Stripe charge.
  const stripeResult = await stripe.paymentIntents.create(
    { amount, customer: userId },
    { idempotencyKey }  // Use SAME key — Stripe dedupes server-side now
  );
  
  // Step 3: Update the row to completed
  await db.query(
    'UPDATE charges SET stripe_id = $1, status = $2 WHERE id = $3',
    [stripeResult.id, 'completed', chargeRow.id]
  );
  
  return { status: 'charged', chargeId: chargeRow.id };
}

Why this works:

  • The DB unique constraint is atomic. Only one INSERT succeeds; the other gets unique-violation error.
  • The losing pod doesn't proceed to Stripe.
  • Even if both pods sent to Stripe, the same idempotency key means Stripe creates only one charge.
  • Defense in depth: 2 layers (DB unique + Stripe idempotency) catch any edge case.

What NOT to Do

1. Don't use SELECT FOR UPDATE. It would lock pessimistically and harm throughput. The unique-constraint pattern is non-blocking + correct.

2. Don't use a Redis lock. Adds complexity + a new failure mode (Redis down). DB constraint is simpler and uses existing infra.

3. Don't extend the dedup window from 5min to 1hr. That just makes the race window larger; doesn't fix the root cause.

4. Don't add a retry on unique-violation. The constraint did its job; treat as 'someone else is handling.'

5. Don't generate idempotency key server-side without frontend coordination. You need frontend to provide the same key on retries; otherwise, retries get a new key + create new charges.

6. Don't trust frontend-provided keys without server-side validation. Validate format + reject if missing.

Prevention Patterns

1. Database unique constraints for any 'must-be-unique' invariant. This is the universal pattern for preventing duplicates in concurrent contexts.

2. Deterministic idempotency keys for retries. UUIDs from frontend, NOT timestamps.

3. INSERT-then-process, not check-then-process. Atomic INSERT is the lock.

4. Same idempotency key passed to external APIs that support it. Stripe, payment processors, email senders, third-party services. Defense in depth.

5. Frontend disable-on-submit. Submit button disables on click; requires explicit re-enable. Reduces but doesn't eliminate the race.

Test That Would Have Caught This

// /tests/integration/concurrent-charge.test.ts
it('does not create duplicate charges when 10 concurrent requests arrive', async () => {
  const userId = 'test_user';
  const productId = 'test_product';
  const requestId = 'test_request_uuid';
  
  // Fire 10 concurrent requests with same idempotency key
  const results = await Promise.all(
    Array.from({ length: 10 }, () => chargeCustomer(userId, 4999, productId, requestId))
  );
  
  // Exactly ONE should succeed; others should report duplicate or in-progress
  const succeeded = results.filter(r => r.status === 'charged');
  expect(succeeded.length).toBe(1);
  
  // Verify ONE charge in DB
  const charges = await db.query(
    'SELECT * FROM charges WHERE user_id = $1 AND product_id = $2',
    [userId, productId]
  );
  expect(charges.rows.length).toBe(1);
  
  // Verify ONE Stripe charge (mock should track calls)
  expect(stripeMock.paymentIntents.create).toHaveBeenCalledTimes(1);
});

Critical: the test uses Promise.all to execute concurrently. Sequential for loop wouldn't catch the race.

For distributed-system testing (across pods), use chaos-test patterns: replicate database, run multiple service instances, fire concurrent requests, assert invariants.

Verification After Fix

Pre-deploy:

1. Run the integration test above. Should pass.

2. Test in staging: open 2 browsers as same user, click 'Pay' simultaneously. Should produce 1 charge.

3. Test with rapid double-click on same button. Should produce 1 charge.

Post-deploy:

1. Watch the duplicate-charge metric in production. Target: 0 duplicates over 30 days (current: 3-5/month).

2. Watch for unique-violation errors in logs. They're EXPECTED (= the constraint working). Should track ~5-10/day at your scale.

3. Watch Stripe dashboard for any duplicate paymentIntents with same idempotency_key. Should be 0.

4. Customer support tickets about double-charging. Target: 0 over 90 days.

If This Pattern Appears Elsewhere

Audit your codebase for the check-then-act pattern. Search for:

  • SELECT ... WHERE ... created_at > NOW() - INTERVAL followed by an INSERT
  • findOne(...) followed by create(...) in your ORM
  • Any 'is X already done' check followed by 'do X'
  • Key generation using Date.now() or Math.random() for what should be deterministic

Likely concurrency bugs in your codebase:

1. Subscription creation — same race as payments. Double-click → 2 subscriptions.

2. Email send dedup — if you check 'email sent in last 1hr' then send, same race exists.

3. User signup — race between signup and email-existence-check could produce 2 users with same email.

4. Comment posting — if frontend allows submit-then-retry, you might create duplicate comments.

5. Webhook handlers — if your webhook handler calls 'check if processed' then 'process,' same race.

For each: replace check-then-act with insert-then-process using a unique constraint.

Key Takeaways

  • Your bug is TOCTOU race in distributed environment. Both pods pass the dedup check before either insert exists.
  • The fix is at the DB layer (unique constraint) not application layer (locks). Atomic, non-blocking, simple.
  • Use deterministic idempotency keys from the frontend. Date.now() regenerates per call, defeating Stripe's dedup.
  • Defense in depth: DB unique + Stripe idempotency. Both layers catch any edge case.
  • Audit other check-then-act patterns. Subscriptions, emails, signups, webhook handlers likely have the same anti-pattern.

Common use cases

  • Engineer hitting an intermittent bug in production that's hard to reproduce locally
  • Tech lead investigating 'mystery duplicates' or 'lost data' incidents
  • Backend engineer designing a feature that has obvious concurrency complexity (counters, balances, queues)
  • DBA investigating deadlocks in production logs
  • Engineer reviewing a PR where they suspect concurrency bugs but can't articulate the issue
  • Team running into 'works in dev, breaks in prod' patterns where prod has higher concurrency

Best AI model for this

Claude Opus 4. Concurrency debugging requires reasoning across timing, ordering, and distributed state — exactly Claude's strengths. ChatGPT GPT-5 second-best.

Pro tips

  • Always paste the actual code + logs. Concurrency bugs need precision; abstract descriptions miss the bug.
  • If you can't reproduce, try increasing parallelism. 100 concurrent workers > 1 sequential worker for surfacing race conditions.
  • Deadlocks in Postgres show in logs. Look for 'deadlock detected' or 'process X waits for Y'. Don't guess.
  • Optimistic concurrency (version columns) usually beats pessimistic locking (SELECT FOR UPDATE) for read-heavy workloads.
  • Idempotency keys are the universal solution to 'might process twice.' Easier than perfect dedup logic.
  • Async/await in JS doesn't make code thread-safe. Two async functions can interleave at any await point.
  • Tests for concurrency: use jest.concurrent or Python's hypothesis-stateful, not 'just run it twice.'

Customization tips

  • Always paste the actual code that handles the concurrent state. Concurrency bugs are precise; abstract descriptions miss the bug.
  • Include logs, especially around the time of the bug. Patterns in logs (two requests within Xms) confirm the race window.
  • Specify your system context. Single-instance bugs differ from multi-instance differ from distributed.
  • Note your DB isolation level if known. Default Read Committed exposes different bug classes than Serializable.
  • Be specific about reproducibility. 'Sometimes' vs 'every Friday' vs 'at peak load' shapes the diagnosis.
  • Use the Database Race Mode variant if logs show 'deadlock detected' or 'process X waits for Y' — different diagnostic patterns apply.

Variants

Database Race Mode

For DB-level race conditions — emphasizes isolation levels, locks, and the specific Postgres/MySQL patterns.

Async/Await Mode

For JS/Python async code — emphasizes interleaving at await points and event-loop reasoning.

Distributed Systems Mode

For multi-node systems — emphasizes message ordering, eventual consistency, and consensus patterns.

Queue Worker Mode

For background-job systems — emphasizes message-delivery guarantees, idempotency, and partitioning.

Frequently asked questions

How do I use the Concurrency Bug Debugger prompt?

Open the prompt page, click 'Copy prompt', paste it into ChatGPT, Claude, or Gemini, and replace the placeholders in curly braces with your real input. The prompt is also launchable directly in each model with one click.

Which AI model works best with Concurrency Bug Debugger?

Claude Opus 4. Concurrency debugging requires reasoning across timing, ordering, and distributed state — exactly Claude's strengths. ChatGPT GPT-5 second-best.

Can I customize the Concurrency Bug Debugger prompt for my use case?

Yes — every Promptolis Original is designed to be customized. Key levers: Always paste the actual code + logs. Concurrency bugs need precision; abstract descriptions miss the bug.; If you can't reproduce, try increasing parallelism. 100 concurrent workers > 1 sequential worker for surfacing race conditions.

Explore more Originals

Hand-crafted 2026-grade prompts that actually change how you work.

← All Promptolis Originals