⚡ Promptolis Original · Coding & Development
📡 Webhook Handler Architect
Designs your webhook receiver: signature verification, idempotency, ordering, retries from senders, and the queue-it-then-process pattern that doesn't drop events at 3am.
Why this is epic
Most webhook handlers process synchronously, ack slowly, and drop events when the sender retries. Production failures: missed payment events, missed user signups, duplicate processing. This Original designs the durable receiver pattern.
Outputs the complete architecture: signature verification (Stripe / Twilio / GitHub style), 200-OK-fast pattern, idempotency keys, queue-then-process, retry handling, ordering guarantees (or explicit acknowledgment that ordering isn't guaranteed), DLQ.
Covers the 7 failure modes most webhook handlers hit: signature bypass attempts, replay attacks, slow processing causing sender retry storm, out-of-order events, duplicate processing, missing events, sender authentication issues.
Calibrated to 2026 webhook reality: Stripe's high-volume webhooks, GitHub's batch deliveries, Slack's slash command patterns, AI service callbacks. Picks the right idempotency + ordering strategy per sender.
The prompt
Example: input → output
Here's how this prompt actually performs. Real input below, real output from Claude Opus 4.
<senders>Stripe (primary). GitHub (for our CI integration). Slack (for our app). Plan to add Twilio in 6 months.</senders> <event-types>Stripe: payment_intent.succeeded, customer.subscription.updated, invoice.payment_failed, customer.subscription.deleted, charge.dispute.created, ~12 total. GitHub: push, pull_request, workflow_run. Slack: events_api (message.channels, app_mention).</event-types> <volume>Stripe: ~5000 events/day, peak ~20K on first-of-month (renewals). GitHub: ~500/day. Slack: ~1000/day. Total: ~6500/day, peak ~22K/day.</volume> <processing-needs>Stripe: update DB customer status, send emails, log to analytics. GitHub: trigger CI workflows. Slack: respond to mentions, log activity.</processing-needs> <idempotency-needs>Stripe: yes, MUST not double-process payments. GitHub push: yes (don't double-build). Slack: somewhat OK to dedupe.</idempotency-needs> <ordering-needs>Stripe subscription events: ideally in order (subscription.created before subscription.updated). But Stripe doesn't guarantee, so we should handle out-of-order.</ordering-needs> <latency-tolerance>Stripe events: customer-facing, ideal <30s. GitHub: <2 min. Slack: <5s for mentions (user waiting), other events lenient.</latency-tolerance> <infrastructure>Node.js 20 + Fastify + Postgres + Redis + BullMQ. Vercel (with serverless functions) — but considering moving webhooks to dedicated long-running infrastructure.</infrastructure> <existing-state>Synchronous handler currently. Stripe occasionally retries because we sometimes take >5s. We've had 2 incidents where we processed an event twice (bad bug for payments).</existing-state>
Webhook Architecture: Stripe + GitHub + Slack Multi-Sender Receiver
Receiver Design
Move webhooks OFF Vercel functions to dedicated infrastructure. Vercel functions have a 10s timeout (Hobby) / 60s (Pro), but they're serverless — cold starts hit you randomly. Webhook receivers benefit from always-warm dedicated handlers.
Recommendation: dedicate a long-running Node service (e.g., Fly.io, Railway, AWS ECS) for webhooks. Single small instance ($10-20/month) handles your 22K/day peak comfortably. Or: move to Cloudflare Workers (no cold start, fast).
Endpoint structure:
POST /webhooks/stripe (verifies + queues Stripe events)
POST /webhooks/github (verifies + queues GitHub events)
POST /webhooks/slack (verifies + queues + responds Slack)
POST /webhooks/twilio (future)
200-OK-fast pattern:
app.post('/webhooks/stripe', async (req, res) => {
// 1. Verify signature (synchronous, fast)
const isValid = verifyStripeSignature(req.headers, req.rawBody);
if (!isValid) return res.status(401).send('Invalid signature');
// 2. Parse event
const event = JSON.parse(req.rawBody);
// 3. Idempotency check (Redis, fast)
const alreadyProcessed = await redis.get(`webhook:stripe:${event.id}`);
if (alreadyProcessed) return res.status(200).send('duplicate'); // Already seen
// 4. Mark as received (Redis, fast)
await redis.setEx(`webhook:stripe:${event.id}`, 86400 * 7, 'received'); // 7-day TTL
// 5. Enqueue for processing
await stripeQueue.add('process-stripe-event', { event }, {
jobId: `stripe-${event.id}`, // BullMQ idempotency key
attempts: 5,
backoff: { type: 'exponential', delay: 5000 },
removeOnComplete: { age: 86400 }, // keep 1 day
removeOnFail: false, // keep forever for DLQ
});
// 6. Respond 200 quickly
return res.status(200).send('ok');
});
Receiver completes in <100ms typically. Worker processes async.
Slack exception: for app_mention (user-waiting), respond synchronously with quick ACK or use Slack's 'I'll get back to you' pattern via their response_url.
Signature Verification
Stripe
import Stripe from 'stripe';
const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!);
const STRIPE_WEBHOOK_SECRET = process.env.STRIPE_WEBHOOK_SECRET!; // whsec_...
function verifyStripeSignature(headers: any, rawBody: string): boolean {
try {
stripe.webhooks.constructEvent(rawBody, headers['stripe-signature'], STRIPE_WEBHOOK_SECRET);
return true;
} catch (e) {
return false;
}
}
Critical: verify against rawBody (string before parse), not the parsed JSON. Stripe's signature is over the literal request bytes.
GitHub
import crypto from 'crypto';
function verifyGitHubSignature(headers: any, rawBody: string): boolean {
const signature = headers['x-hub-signature-256'];
if (!signature) return false;
const expected = 'sha256=' + crypto
.createHmac('sha256', process.env.GITHUB_WEBHOOK_SECRET!)
.update(rawBody)
.digest('hex');
return crypto.timingSafeEqual(
Buffer.from(signature),
Buffer.from(expected)
);
}
Use timingSafeEqual (constant-time comparison). Never === for signatures (timing-attack vulnerable).
Slack
function verifySlackSignature(headers: any, rawBody: string): boolean {
const timestamp = headers['x-slack-request-timestamp'];
const signature = headers['x-slack-signature'];
// Replay defense: reject if timestamp >5min old
const now = Math.floor(Date.now() / 1000);
if (Math.abs(now - parseInt(timestamp)) > 300) return false;
const baseString = `v0:${timestamp}:${rawBody}`;
const expected = 'v0=' + crypto
.createHmac('sha256', process.env.SLACK_SIGNING_SECRET!)
.update(baseString)
.digest('hex');
return crypto.timingSafeEqual(Buffer.from(signature), Buffer.from(expected));
}
Idempotency
Strategy: event ID as idempotency key. Stored in Redis with 7-day TTL.
// Check at receiver
const alreadyProcessed = await redis.get(`webhook:${sender}:${eventId}`);
if (alreadyProcessed) {
// Sender retried. Already processed. Return 200 immediately.
return res.status(200).send('duplicate');
}
// Mark as received BEFORE enqueueing
await redis.setEx(`webhook:${sender}:${eventId}`, 86400 * 7, 'received');
// Then enqueue
await queue.add('process', { event }, { jobId: `${sender}-${eventId}` });
Why 7 days: most webhook senders retry for 7 days max. After that, your idempotency cache can drop the entry.
Defense in depth:
- Receiver: Redis check
- Worker: BullMQ jobId (also dedupes within queue)
- Database operations: idempotent SQL (UPSERTs with unique constraints)
3 layers because a single failure shouldn't process twice. Redis can fail; jobId might not work; DB constraints catch the last edge case.
Replay Attack Defense
For senders that include timestamps (Slack, custom):
- Reject events older than 5 minutes
- Prevents an attacker from capturing a valid request + replaying later
Stripe: Stripe's signed payload includes timestamp; signature check via constructEvent validates it within 5-min window automatically.
GitHub: doesn't include timestamp in signature. Mitigations:
- Use
received_atheader if present - Track most recent event timestamp per delivery_id; reject if you see significantly older
Queue + Worker Pipeline
Per-sender queues (BullMQ):
stripe-events(priority on payment events)github-eventsslack-events(high priority for user-waiting)
Per-event-type handlers:
// /workers/stripe-worker.ts
stripeWorker.process(async (job) => {
const { event } = job.data;
switch (event.type) {
case 'payment_intent.succeeded':
return handlePaymentSucceeded(event);
case 'customer.subscription.updated':
return handleSubscriptionUpdated(event);
case 'customer.subscription.deleted':
return handleSubscriptionDeleted(event);
case 'invoice.payment_failed':
return handlePaymentFailed(event);
case 'charge.dispute.created':
return handleDispute(event);
default:
logger.warn({ eventType: event.type }, 'Unknown Stripe event type — ignored');
return { status: 'ignored' };
}
});
async function handlePaymentSucceeded(event: Stripe.PaymentIntentSucceededEvent) {
const intent = event.data.object;
await db.transaction(async (tx) => {
await tx.payments.upsert({ stripeId: intent.id, ...mapPaymentData(intent) });
await tx.users.markPaid(intent.customer);
});
await emailService.sendReceipt(intent.customer);
}
Critical: all handlers idempotent at DB level (UPSERTs, unique constraints). Even if processed twice somehow, DB state is correct.
Ordering Strategy
Stripe explicitly does not guarantee order. Design for out-of-order arrival:
async function handleSubscriptionUpdated(event: Stripe.SubscriptionUpdatedEvent) {
const sub = event.data.object;
// Critical: use Stripe's status + items as source of truth, NOT what we infer
await db.subscriptions.upsert({
stripeId: sub.id,
status: sub.status,
currentPeriodEnd: new Date(sub.current_period_end * 1000),
// ... use ALL fields from event, don't merge with previous state
});
}
Don't try to enforce 'subscription.created must process before subscription.updated.' Instead:
- Both handlers UPSERT (idempotent)
- Each event contains full subscription state
- Latest state wins (use event timestamp + DB constraint)
If subscription.updated arrives before subscription.created, the UPSERT creates the row with the updated state — correct.
Edge case: out-of-order with stale data. Use event timestamp:
UPDATE subscriptions
SET status = $1, last_event_timestamp = $2
WHERE stripe_id = $3 AND last_event_timestamp < $2; -- only update if newer
Prevents older event from overwriting newer state.
Retry & DLQ
Worker retry policy (BullMQ):
- Max 5 attempts
- Exponential backoff: 5s, 25s, 125s, 625s, ~52min
- After 5 failures: stays in
failedstate
DLQ:
worker.on('failed', async (job, err) => {
if (job.attemptsMade >= job.opts.attempts) {
// Moved to permanent-failure state
await db.failed_webhooks.create({
sender: 'stripe',
eventId: job.data.event.id,
eventType: job.data.event.type,
lastError: serializeError(err),
payload: job.data.event,
failedAt: new Date(),
});
// Alert if event type is critical
if (CRITICAL_EVENT_TYPES.includes(job.data.event.type)) {
await pagerduty.alert(`Webhook handler failed permanently: ${job.data.event.id}`);
}
}
});
const CRITICAL_EVENT_TYPES = [
'payment_intent.succeeded',
'customer.subscription.deleted',
'charge.dispute.created',
];
DLQ inspection tool:
- Admin endpoint to list failed webhooks by date / sender / event type
- Re-enqueue button (after fixing root cause)
- Bulk re-enqueue (after fixing root cause for many)
- Delete (after confirming not actionable)
Per-Sender Configuration
Stripe
- 7-day retry window (their max)
- 5-min timestamp tolerance
- Webhook secret rotation: schedule annually + during incidents
GitHub
- 8-hour retry window
- IP allowlist available (defense in depth): https://api.github.com/meta returns IPs
- Secret token rotation: when secret leaks or annually
Slack
- Synchronous-friendly: response_url for delayed responses
- 3-second response timeout from Slack's side (must respond fast)
- Signing secret + verification token both available; signing secret preferred
Twilio (future)
- Different signature scheme (HMAC-SHA1 with URL + params)
- Auto-disables webhook URL after repeated failures (be careful)
Observability
Metrics per sender:
webhook.received(counter, by sender + event type)webhook.received.duration(histogram, ms)webhook.signature.invalid(counter — alert on spike)webhook.duplicate(counter — informational)webhook.queue.depth(per-sender queue size)webhook.worker.duration(per-event-type processing time)webhook.worker.failed(counter)webhook.dlq.size(gauge)
Alerts:
- Signature failures >10 in 1 min → page (likely attack or secret rotation issue)
- Queue depth >1000 → Slack-only alert (likely worker capacity issue)
- DLQ size >0 for >1 hour → Slack alert
- DLQ size >5 critical events → page
- 200-OK p95 latency >2s → page (sender will start retrying)
Logs structured per request:
{
"sender": "stripe",
"event_id": "evt_abc123",
"event_type": "payment_intent.succeeded",
"duration_ms": 45,
"status": "queued",
"correlation_id": "req_xyz789"
}
Implementation Skeleton
/services/webhook-receiver/
package.json (Node, Fastify, BullMQ, Stripe, GitHub SDK)
src/
server.ts (Fastify setup)
routes/
stripe.ts (POST /webhooks/stripe)
github.ts (POST /webhooks/github)
slack.ts (POST /webhooks/slack)
lib/
verify-signature.ts (per-sender verification)
idempotency.ts (Redis-backed)
queues.ts (BullMQ queue setup)
Dockerfile
fly.toml (or other deployment config)
/services/webhook-worker/
src/
workers/
stripe-worker.ts
github-worker.ts
slack-worker.ts
handlers/
stripe/
payment-succeeded.ts
subscription-updated.ts
... (one per event type)
github/
push.ts
pull-request.ts
slack/
message.ts
app-mention.ts
dlq/
replay.ts (admin tool)
Why two services: receiver is hot-path latency-sensitive; worker is throughput-sensitive. Separate scaling, separate deployment.
Testing
Unit tests
- Signature verification per sender (positive + negative cases)
- Idempotency cache hit/miss
- Each event-type handler with sample event data
Integration tests
- Full receive-queue-process flow with test BullMQ + Redis
- Sample real payloads (anonymized) per event type
Stripe-specific
- Stripe CLI
stripe listen --forward-to localhost:3000/webhooks/stripefor local testing stripe trigger payment_intent.succeededfor synthetic events
Replay testing
- Replay 30 days of historical events through staging worker
- Verify DB end state is correct
- Verify no double-processing
Load testing
- Burst 1000 events/sec to staging receiver
- Verify <2s p95 200-OK
- Verify worker keeps up (queue depth doesn't grow unboundedly)
What This Architecture Won't Solve
- Won't catch sender-side bugs. If Stripe sends a malformed event, your handler may crash on it. Defensive parsing required.
- Won't handle senders that exhaust their retry window. If Stripe gives up after 7 days, the event is gone. Critical events should have an audit reconciliation job.
- Won't replace audit log. Webhook events are notifications; truth lives at the sender. For Stripe, your DB should occasionally reconcile against Stripe's API.
- Won't prevent processing of malicious events. Signature verification + replay defense help, but a leaked secret bypasses both. Rotate secrets.
- Won't handle eventual consistency lag from senders. Stripe's API state may not reflect a webhook event for a few seconds.
Maintenance Cadence
Quarterly:
- Review DLQ. Patterns? Root cause?
- Rotate webhook secrets (especially after team member departures)
- Audit signature verification code (no shortcuts crept in)
Annually:
- Reconcile Stripe DB state against Stripe API state (find any missed events)
- Update SDK versions
- Review event types — any new Stripe events you should handle?
- Test signature verification against latest sender docs
Key Takeaways
- Move webhooks off Vercel functions to dedicated long-running infrastructure. Cold starts hurt webhook reliability.
- 200-OK in <1s. Queue then process. Otherwise senders retry; you double-process.
- Signature verify FIRST. Reject before any other work. Use timing-safe comparison.
- 3-layer idempotency: Redis cache + BullMQ jobId + DB constraints. Defense in depth.
- Handlers are UPSERTs, not INSERT-or-UPDATE branches. Out-of-order events arrive; UPSERTs handle gracefully.
- DLQ + alerts on critical events. Don't let permanent failures sit silent for days.
Common use cases
- Engineer building a Stripe webhook handler for the first time
- Backend lead consolidating multiple ad-hoc webhook receivers into a unified pattern
- Solo founder hitting webhook reliability bugs in production
- Team adding GitHub webhooks for CI/CD integration
- Engineer designing webhook receiver for Slack app
- Architect designing receiver for high-volume events (1000s/sec)
Best AI model for this
Claude Opus 4. Webhook handler design needs reasoning about delivery guarantees, idempotency, and timing — exactly Claude's strengths. ChatGPT GPT-5 second-best.
Pro tips
- Verify signatures BEFORE doing anything else. Reject unsigned/invalid before logging.
- 200 OK in <1s. Sender will retry if you take longer; you'll process events twice.
- Queue then process. Receive endpoint enqueues + acks; worker processes async.
- Idempotency by event ID, not by payload hash. Senders include event IDs for this reason.
- Replay attacks: include timestamp validation (reject events >5 min old).
- Don't trust event ordering. Most senders explicitly do not guarantee order.
- DLQ for permanent failures. After N retries, move to DLQ + alert.
Customization tips
- List ALL senders precisely. Each has different signature schemes, retry policies, and event taxonomies — the architecture calibrates per sender.
- Specify event types you'll handle. The handler-per-event-type pattern depends on knowing the inventory.
- Be realistic about volume. 100 events/day vs 100K events/day need different infrastructure (vercel function vs dedicated service).
- Be explicit about idempotency requirements. Payment events must not double-process; ping events probably can. Different rigor.
- Mention ordering needs honestly. Most senders don't guarantee order; trying to enforce it usually means design issues elsewhere.
- Use the High-Volume Mode variant if you handle 1000s+/sec — it adds backpressure, partition routing, and dedicated scaling patterns.
Variants
Stripe Webhook Mode
For Stripe specifically — emphasizes signature verification quirks, event-type taxonomy, idempotency patterns.
GitHub Webhook Mode
For GitHub — emphasizes batch deliveries, secret token validation, IP allowlist.
Multi-Sender Mode
For receivers handling multiple senders (Stripe + Slack + Twilio + custom) — emphasizes per-sender configuration + unified processing pipeline.
High-Volume Mode
For 1000s/sec webhook volume — emphasizes scaling, backpressure, dedicated infrastructure.
Frequently asked questions
How do I use the Webhook Handler Architect prompt?
Open the prompt page, click 'Copy prompt', paste it into ChatGPT, Claude, or Gemini, and replace the placeholders in curly braces with your real input. The prompt is also launchable directly in each model with one click.
Which AI model works best with Webhook Handler Architect?
Claude Opus 4. Webhook handler design needs reasoning about delivery guarantees, idempotency, and timing — exactly Claude's strengths. ChatGPT GPT-5 second-best.
Can I customize the Webhook Handler Architect prompt for my use case?
Yes — every Promptolis Original is designed to be customized. Key levers: Verify signatures BEFORE doing anything else. Reject unsigned/invalid before logging.; 200 OK in <1s. Sender will retry if you take longer; you'll process events twice.
Explore more Originals
Hand-crafted 2026-grade prompts that actually change how you work.
← All Promptolis Originals