/
DE

⚡ Promptolis Original · Coding & Development

📡 Webhook Handler Architect

Designs your webhook receiver: signature verification, idempotency, ordering, retries from senders, and the queue-it-then-process pattern that doesn't drop events at 3am.

⏱️ 5 min to set up 🤖 ~110 seconds in Claude 🗓️ Updated 2026-04-28

Why this is epic

Most webhook handlers process synchronously, ack slowly, and drop events when the sender retries. Production failures: missed payment events, missed user signups, duplicate processing. This Original designs the durable receiver pattern.

Outputs the complete architecture: signature verification (Stripe / Twilio / GitHub style), 200-OK-fast pattern, idempotency keys, queue-then-process, retry handling, ordering guarantees (or explicit acknowledgment that ordering isn't guaranteed), DLQ.

Covers the 7 failure modes most webhook handlers hit: signature bypass attempts, replay attacks, slow processing causing sender retry storm, out-of-order events, duplicate processing, missing events, sender authentication issues.

Calibrated to 2026 webhook reality: Stripe's high-volume webhooks, GitHub's batch deliveries, Slack's slash command patterns, AI service callbacks. Picks the right idempotency + ordering strategy per sender.

The prompt

Promptolis Original · Copy-ready
<role> You are a webhook architecture engineer with 7+ years building durable webhook receivers for Stripe, GitHub, Slack, Twilio, AI services, and custom internal events. You have shipped webhook handlers processing 100M+ events/year. You know which failure modes drop events + the patterns that don't. You are direct. You will tell a builder their synchronous handler will time out under retry pressure, that signature verification skipping is the security bug, or that they're trying to enforce ordering that the sender doesn't guarantee. You refuse to recommend 'just retry' as a generic answer. </role> <principles> 1. Signature verification first. Reject before any other work. 2. 200 OK in <1s. Queue then process. 3. Idempotency by event ID. 4. Replay attack defense: timestamp validation. 5. Order isn't guaranteed. Design for any-order arrival. 6. DLQ permanent failures + alert. 7. Worker retries with exponential backoff. Independent of sender retry. </principles> <input> <senders>{Stripe / GitHub / Slack / Twilio / custom / multiple}</senders> <event-types>{the events you receive — payment.succeeded, push, message.posted, etc.}</event-types> <volume>{events/day, peak/day, peak burst pattern}</volume> <processing-needs>{what each event triggers in your system}</processing-needs> <idempotency-needs>{which events MUST not be processed twice — payment yes, ping probably no}</idempotency-needs> <ordering-needs>{any events that MUST be processed in order}</ordering-needs> <latency-tolerance>{how fast events must be processed: instant / minutes / hours OK}</latency-tolerance> <infrastructure>{your stack: framework, queue infra, monitoring}</infrastructure> <existing-state>{nothing / synchronous handler that works mostly / mature but messy}</existing-state> </input> <output-format> # Webhook Architecture: [senders] ## Receiver Design 200-OK-fast pattern. Endpoint structure. Per-sender receiver routes. ## Signature Verification Per-sender verification: secret, algorithm, header location, code pattern. ## Idempotency Key strategy. TTL. Storage (Redis vs DB). ## Replay Attack Defense Timestamp validation. Tolerance window. ## Queue + Worker Pipeline Queue choice. Worker pattern. Per-event-type handlers. ## Ordering Strategy What IS guaranteed (or not). How to handle out-of-order events. ## Retry & DLQ Worker retry policy. DLQ on max retries. Alert thresholds. ## Per-Sender Configuration The specific quirks per sender. ## Observability Metrics, logs, alerts specific to webhooks. ## Implementation Skeleton File structure + key code patterns. ## Testing Replay specific events. Mock signatures. Out-of-order scenarios. ## What This Architecture Won't Solve Honest limits. ## Maintenance Cadence Rotate secrets, audit, etc. ## Key Takeaways 4-6 bullets — for the team's playbook. </output-format> <auto-intake> If input incomplete: ask for senders, event types, volume, processing needs, idempotency, ordering, latency, infrastructure, existing state. </auto-intake> Now, design the webhook architecture:

Example: input → output

Here's how this prompt actually performs. Real input below, real output from Claude Opus 4.

📝 Input
<senders>Stripe (primary). GitHub (for our CI integration). Slack (for our app). Plan to add Twilio in 6 months.</senders>
<event-types>Stripe: payment_intent.succeeded, customer.subscription.updated, invoice.payment_failed, customer.subscription.deleted, charge.dispute.created, ~12 total. GitHub: push, pull_request, workflow_run. Slack: events_api (message.channels, app_mention).</event-types>
<volume>Stripe: ~5000 events/day, peak ~20K on first-of-month (renewals). GitHub: ~500/day. Slack: ~1000/day. Total: ~6500/day, peak ~22K/day.</volume>
<processing-needs>Stripe: update DB customer status, send emails, log to analytics. GitHub: trigger CI workflows. Slack: respond to mentions, log activity.</processing-needs>
<idempotency-needs>Stripe: yes, MUST not double-process payments. GitHub push: yes (don't double-build). Slack: somewhat OK to dedupe.</idempotency-needs>
<ordering-needs>Stripe subscription events: ideally in order (subscription.created before subscription.updated). But Stripe doesn't guarantee, so we should handle out-of-order.</ordering-needs>
<latency-tolerance>Stripe events: customer-facing, ideal <30s. GitHub: <2 min. Slack: <5s for mentions (user waiting), other events lenient.</latency-tolerance>
<infrastructure>Node.js 20 + Fastify + Postgres + Redis + BullMQ. Vercel (with serverless functions) — but considering moving webhooks to dedicated long-running infrastructure.</infrastructure>
<existing-state>Synchronous handler currently. Stripe occasionally retries because we sometimes take >5s. We've had 2 incidents where we processed an event twice (bad bug for payments).</existing-state>
🤖 Output

Webhook Architecture: Stripe + GitHub + Slack Multi-Sender Receiver

Receiver Design

Move webhooks OFF Vercel functions to dedicated infrastructure. Vercel functions have a 10s timeout (Hobby) / 60s (Pro), but they're serverless — cold starts hit you randomly. Webhook receivers benefit from always-warm dedicated handlers.

Recommendation: dedicate a long-running Node service (e.g., Fly.io, Railway, AWS ECS) for webhooks. Single small instance ($10-20/month) handles your 22K/day peak comfortably. Or: move to Cloudflare Workers (no cold start, fast).

Endpoint structure:

POST /webhooks/stripe       (verifies + queues Stripe events)
POST /webhooks/github       (verifies + queues GitHub events)
POST /webhooks/slack        (verifies + queues + responds Slack)
POST /webhooks/twilio       (future)

200-OK-fast pattern:

app.post('/webhooks/stripe', async (req, res) => {
  // 1. Verify signature (synchronous, fast)
  const isValid = verifyStripeSignature(req.headers, req.rawBody);
  if (!isValid) return res.status(401).send('Invalid signature');
  
  // 2. Parse event
  const event = JSON.parse(req.rawBody);
  
  // 3. Idempotency check (Redis, fast)
  const alreadyProcessed = await redis.get(`webhook:stripe:${event.id}`);
  if (alreadyProcessed) return res.status(200).send('duplicate'); // Already seen
  
  // 4. Mark as received (Redis, fast)
  await redis.setEx(`webhook:stripe:${event.id}`, 86400 * 7, 'received'); // 7-day TTL
  
  // 5. Enqueue for processing
  await stripeQueue.add('process-stripe-event', { event }, {
    jobId: `stripe-${event.id}`, // BullMQ idempotency key
    attempts: 5,
    backoff: { type: 'exponential', delay: 5000 },
    removeOnComplete: { age: 86400 }, // keep 1 day
    removeOnFail: false, // keep forever for DLQ
  });
  
  // 6. Respond 200 quickly
  return res.status(200).send('ok');
});

Receiver completes in <100ms typically. Worker processes async.

Slack exception: for app_mention (user-waiting), respond synchronously with quick ACK or use Slack's 'I'll get back to you' pattern via their response_url.

Signature Verification

Stripe
import Stripe from 'stripe';

const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!);
const STRIPE_WEBHOOK_SECRET = process.env.STRIPE_WEBHOOK_SECRET!; // whsec_...

function verifyStripeSignature(headers: any, rawBody: string): boolean {
  try {
    stripe.webhooks.constructEvent(rawBody, headers['stripe-signature'], STRIPE_WEBHOOK_SECRET);
    return true;
  } catch (e) {
    return false;
  }
}

Critical: verify against rawBody (string before parse), not the parsed JSON. Stripe's signature is over the literal request bytes.

GitHub
import crypto from 'crypto';

function verifyGitHubSignature(headers: any, rawBody: string): boolean {
  const signature = headers['x-hub-signature-256'];
  if (!signature) return false;
  
  const expected = 'sha256=' + crypto
    .createHmac('sha256', process.env.GITHUB_WEBHOOK_SECRET!)
    .update(rawBody)
    .digest('hex');
  
  return crypto.timingSafeEqual(
    Buffer.from(signature),
    Buffer.from(expected)
  );
}

Use timingSafeEqual (constant-time comparison). Never === for signatures (timing-attack vulnerable).

Slack
function verifySlackSignature(headers: any, rawBody: string): boolean {
  const timestamp = headers['x-slack-request-timestamp'];
  const signature = headers['x-slack-signature'];
  
  // Replay defense: reject if timestamp >5min old
  const now = Math.floor(Date.now() / 1000);
  if (Math.abs(now - parseInt(timestamp)) > 300) return false;
  
  const baseString = `v0:${timestamp}:${rawBody}`;
  const expected = 'v0=' + crypto
    .createHmac('sha256', process.env.SLACK_SIGNING_SECRET!)
    .update(baseString)
    .digest('hex');
  
  return crypto.timingSafeEqual(Buffer.from(signature), Buffer.from(expected));
}

Idempotency

Strategy: event ID as idempotency key. Stored in Redis with 7-day TTL.

// Check at receiver
const alreadyProcessed = await redis.get(`webhook:${sender}:${eventId}`);
if (alreadyProcessed) {
  // Sender retried. Already processed. Return 200 immediately.
  return res.status(200).send('duplicate');
}

// Mark as received BEFORE enqueueing
await redis.setEx(`webhook:${sender}:${eventId}`, 86400 * 7, 'received');

// Then enqueue
await queue.add('process', { event }, { jobId: `${sender}-${eventId}` });

Why 7 days: most webhook senders retry for 7 days max. After that, your idempotency cache can drop the entry.

Defense in depth:

  • Receiver: Redis check
  • Worker: BullMQ jobId (also dedupes within queue)
  • Database operations: idempotent SQL (UPSERTs with unique constraints)

3 layers because a single failure shouldn't process twice. Redis can fail; jobId might not work; DB constraints catch the last edge case.

Replay Attack Defense

For senders that include timestamps (Slack, custom):

  • Reject events older than 5 minutes
  • Prevents an attacker from capturing a valid request + replaying later

Stripe: Stripe's signed payload includes timestamp; signature check via constructEvent validates it within 5-min window automatically.

GitHub: doesn't include timestamp in signature. Mitigations:

  • Use received_at header if present
  • Track most recent event timestamp per delivery_id; reject if you see significantly older

Queue + Worker Pipeline

Per-sender queues (BullMQ):

  • stripe-events (priority on payment events)
  • github-events
  • slack-events (high priority for user-waiting)

Per-event-type handlers:

// /workers/stripe-worker.ts
stripeWorker.process(async (job) => {
  const { event } = job.data;
  
  switch (event.type) {
    case 'payment_intent.succeeded':
      return handlePaymentSucceeded(event);
    case 'customer.subscription.updated':
      return handleSubscriptionUpdated(event);
    case 'customer.subscription.deleted':
      return handleSubscriptionDeleted(event);
    case 'invoice.payment_failed':
      return handlePaymentFailed(event);
    case 'charge.dispute.created':
      return handleDispute(event);
    default:
      logger.warn({ eventType: event.type }, 'Unknown Stripe event type — ignored');
      return { status: 'ignored' };
  }
});

async function handlePaymentSucceeded(event: Stripe.PaymentIntentSucceededEvent) {
  const intent = event.data.object;
  await db.transaction(async (tx) => {
    await tx.payments.upsert({ stripeId: intent.id, ...mapPaymentData(intent) });
    await tx.users.markPaid(intent.customer);
  });
  await emailService.sendReceipt(intent.customer);
}

Critical: all handlers idempotent at DB level (UPSERTs, unique constraints). Even if processed twice somehow, DB state is correct.

Ordering Strategy

Stripe explicitly does not guarantee order. Design for out-of-order arrival:

async function handleSubscriptionUpdated(event: Stripe.SubscriptionUpdatedEvent) {
  const sub = event.data.object;
  
  // Critical: use Stripe's status + items as source of truth, NOT what we infer
  await db.subscriptions.upsert({
    stripeId: sub.id,
    status: sub.status,
    currentPeriodEnd: new Date(sub.current_period_end * 1000),
    // ... use ALL fields from event, don't merge with previous state
  });
}

Don't try to enforce 'subscription.created must process before subscription.updated.' Instead:

  • Both handlers UPSERT (idempotent)
  • Each event contains full subscription state
  • Latest state wins (use event timestamp + DB constraint)

If subscription.updated arrives before subscription.created, the UPSERT creates the row with the updated state — correct.

Edge case: out-of-order with stale data. Use event timestamp:

UPDATE subscriptions 
SET status = $1, last_event_timestamp = $2
WHERE stripe_id = $3 AND last_event_timestamp < $2;  -- only update if newer

Prevents older event from overwriting newer state.

Retry & DLQ

Worker retry policy (BullMQ):

  • Max 5 attempts
  • Exponential backoff: 5s, 25s, 125s, 625s, ~52min
  • After 5 failures: stays in failed state

DLQ:

worker.on('failed', async (job, err) => {
  if (job.attemptsMade >= job.opts.attempts) {
    // Moved to permanent-failure state
    await db.failed_webhooks.create({
      sender: 'stripe',
      eventId: job.data.event.id,
      eventType: job.data.event.type,
      lastError: serializeError(err),
      payload: job.data.event,
      failedAt: new Date(),
    });
    
    // Alert if event type is critical
    if (CRITICAL_EVENT_TYPES.includes(job.data.event.type)) {
      await pagerduty.alert(`Webhook handler failed permanently: ${job.data.event.id}`);
    }
  }
});

const CRITICAL_EVENT_TYPES = [
  'payment_intent.succeeded',
  'customer.subscription.deleted',
  'charge.dispute.created',
];

DLQ inspection tool:

  • Admin endpoint to list failed webhooks by date / sender / event type
  • Re-enqueue button (after fixing root cause)
  • Bulk re-enqueue (after fixing root cause for many)
  • Delete (after confirming not actionable)

Per-Sender Configuration

Stripe
  • 7-day retry window (their max)
  • 5-min timestamp tolerance
  • Webhook secret rotation: schedule annually + during incidents
GitHub
  • 8-hour retry window
  • IP allowlist available (defense in depth): https://api.github.com/meta returns IPs
  • Secret token rotation: when secret leaks or annually
Slack
  • Synchronous-friendly: response_url for delayed responses
  • 3-second response timeout from Slack's side (must respond fast)
  • Signing secret + verification token both available; signing secret preferred
Twilio (future)
  • Different signature scheme (HMAC-SHA1 with URL + params)
  • Auto-disables webhook URL after repeated failures (be careful)

Observability

Metrics per sender:

  • webhook.received (counter, by sender + event type)
  • webhook.received.duration (histogram, ms)
  • webhook.signature.invalid (counter — alert on spike)
  • webhook.duplicate (counter — informational)
  • webhook.queue.depth (per-sender queue size)
  • webhook.worker.duration (per-event-type processing time)
  • webhook.worker.failed (counter)
  • webhook.dlq.size (gauge)

Alerts:

  • Signature failures >10 in 1 min → page (likely attack or secret rotation issue)
  • Queue depth >1000 → Slack-only alert (likely worker capacity issue)
  • DLQ size >0 for >1 hour → Slack alert
  • DLQ size >5 critical events → page
  • 200-OK p95 latency >2s → page (sender will start retrying)

Logs structured per request:

{
  "sender": "stripe",
  "event_id": "evt_abc123",
  "event_type": "payment_intent.succeeded",
  "duration_ms": 45,
  "status": "queued",
  "correlation_id": "req_xyz789"
}

Implementation Skeleton

/services/webhook-receiver/
  package.json (Node, Fastify, BullMQ, Stripe, GitHub SDK)
  src/
    server.ts                (Fastify setup)
    routes/
      stripe.ts             (POST /webhooks/stripe)
      github.ts             (POST /webhooks/github)
      slack.ts              (POST /webhooks/slack)
    lib/
      verify-signature.ts   (per-sender verification)
      idempotency.ts        (Redis-backed)
      queues.ts             (BullMQ queue setup)
  Dockerfile
  fly.toml (or other deployment config)

/services/webhook-worker/
  src/
    workers/
      stripe-worker.ts
      github-worker.ts
      slack-worker.ts
    handlers/
      stripe/
        payment-succeeded.ts
        subscription-updated.ts
        ... (one per event type)
      github/
        push.ts
        pull-request.ts
      slack/
        message.ts
        app-mention.ts
    dlq/
      replay.ts             (admin tool)

Why two services: receiver is hot-path latency-sensitive; worker is throughput-sensitive. Separate scaling, separate deployment.

Testing

Unit tests
  • Signature verification per sender (positive + negative cases)
  • Idempotency cache hit/miss
  • Each event-type handler with sample event data
Integration tests
  • Full receive-queue-process flow with test BullMQ + Redis
  • Sample real payloads (anonymized) per event type
Stripe-specific
  • Stripe CLI stripe listen --forward-to localhost:3000/webhooks/stripe for local testing
  • stripe trigger payment_intent.succeeded for synthetic events
Replay testing
  • Replay 30 days of historical events through staging worker
  • Verify DB end state is correct
  • Verify no double-processing
Load testing
  • Burst 1000 events/sec to staging receiver
  • Verify <2s p95 200-OK
  • Verify worker keeps up (queue depth doesn't grow unboundedly)

What This Architecture Won't Solve

  • Won't catch sender-side bugs. If Stripe sends a malformed event, your handler may crash on it. Defensive parsing required.
  • Won't handle senders that exhaust their retry window. If Stripe gives up after 7 days, the event is gone. Critical events should have an audit reconciliation job.
  • Won't replace audit log. Webhook events are notifications; truth lives at the sender. For Stripe, your DB should occasionally reconcile against Stripe's API.
  • Won't prevent processing of malicious events. Signature verification + replay defense help, but a leaked secret bypasses both. Rotate secrets.
  • Won't handle eventual consistency lag from senders. Stripe's API state may not reflect a webhook event for a few seconds.

Maintenance Cadence

Quarterly:

  • Review DLQ. Patterns? Root cause?
  • Rotate webhook secrets (especially after team member departures)
  • Audit signature verification code (no shortcuts crept in)

Annually:

  • Reconcile Stripe DB state against Stripe API state (find any missed events)
  • Update SDK versions
  • Review event types — any new Stripe events you should handle?
  • Test signature verification against latest sender docs

Key Takeaways

  • Move webhooks off Vercel functions to dedicated long-running infrastructure. Cold starts hurt webhook reliability.
  • 200-OK in <1s. Queue then process. Otherwise senders retry; you double-process.
  • Signature verify FIRST. Reject before any other work. Use timing-safe comparison.
  • 3-layer idempotency: Redis cache + BullMQ jobId + DB constraints. Defense in depth.
  • Handlers are UPSERTs, not INSERT-or-UPDATE branches. Out-of-order events arrive; UPSERTs handle gracefully.
  • DLQ + alerts on critical events. Don't let permanent failures sit silent for days.

Common use cases

  • Engineer building a Stripe webhook handler for the first time
  • Backend lead consolidating multiple ad-hoc webhook receivers into a unified pattern
  • Solo founder hitting webhook reliability bugs in production
  • Team adding GitHub webhooks for CI/CD integration
  • Engineer designing webhook receiver for Slack app
  • Architect designing receiver for high-volume events (1000s/sec)

Best AI model for this

Claude Opus 4. Webhook handler design needs reasoning about delivery guarantees, idempotency, and timing — exactly Claude's strengths. ChatGPT GPT-5 second-best.

Pro tips

  • Verify signatures BEFORE doing anything else. Reject unsigned/invalid before logging.
  • 200 OK in <1s. Sender will retry if you take longer; you'll process events twice.
  • Queue then process. Receive endpoint enqueues + acks; worker processes async.
  • Idempotency by event ID, not by payload hash. Senders include event IDs for this reason.
  • Replay attacks: include timestamp validation (reject events >5 min old).
  • Don't trust event ordering. Most senders explicitly do not guarantee order.
  • DLQ for permanent failures. After N retries, move to DLQ + alert.

Customization tips

  • List ALL senders precisely. Each has different signature schemes, retry policies, and event taxonomies — the architecture calibrates per sender.
  • Specify event types you'll handle. The handler-per-event-type pattern depends on knowing the inventory.
  • Be realistic about volume. 100 events/day vs 100K events/day need different infrastructure (vercel function vs dedicated service).
  • Be explicit about idempotency requirements. Payment events must not double-process; ping events probably can. Different rigor.
  • Mention ordering needs honestly. Most senders don't guarantee order; trying to enforce it usually means design issues elsewhere.
  • Use the High-Volume Mode variant if you handle 1000s+/sec — it adds backpressure, partition routing, and dedicated scaling patterns.

Variants

Stripe Webhook Mode

For Stripe specifically — emphasizes signature verification quirks, event-type taxonomy, idempotency patterns.

GitHub Webhook Mode

For GitHub — emphasizes batch deliveries, secret token validation, IP allowlist.

Multi-Sender Mode

For receivers handling multiple senders (Stripe + Slack + Twilio + custom) — emphasizes per-sender configuration + unified processing pipeline.

High-Volume Mode

For 1000s/sec webhook volume — emphasizes scaling, backpressure, dedicated infrastructure.

Frequently asked questions

How do I use the Webhook Handler Architect prompt?

Open the prompt page, click 'Copy prompt', paste it into ChatGPT, Claude, or Gemini, and replace the placeholders in curly braces with your real input. The prompt is also launchable directly in each model with one click.

Which AI model works best with Webhook Handler Architect?

Claude Opus 4. Webhook handler design needs reasoning about delivery guarantees, idempotency, and timing — exactly Claude's strengths. ChatGPT GPT-5 second-best.

Can I customize the Webhook Handler Architect prompt for my use case?

Yes — every Promptolis Original is designed to be customized. Key levers: Verify signatures BEFORE doing anything else. Reject unsigned/invalid before logging.; 200 OK in <1s. Sender will retry if you take longer; you'll process events twice.

Explore more Originals

Hand-crafted 2026-grade prompts that actually change how you work.

← All Promptolis Originals