/
DE

⚡ Promptolis Original · Coding & Development

⚠️ Error Handling Strategy

Designs your error-handling system: which errors to retry, which to escalate, which to log-and-continue, and the explicit error-class hierarchy that makes 'silent failures in production' a memory.

⏱️ 4 min to set up 🤖 ~95 seconds in Claude 🗓️ Updated 2026-04-28

Why this is epic

Most codebases handle errors with 'try / catch (e) { console.log(e) }' and pray. Then production has silent failures, retries that hammer downstream services, and Sentry alerts nobody investigates. This Original designs the actual policy.

Outputs the complete strategy: error classification (transient / permanent / business-logic), per-class handling rules, retry policy with backoff + circuit breaker, what to log vs alert, customer-facing error messages, and the recovery procedures.

Includes the 4 questions every error-handling design must answer: (1) is this error retriable? (2) does the user need to know? (3) should this alert on-call? (4) what's the fallback if it persists? Answers shape the implementation.

Calibrated to 2026 reality: distributed systems with cascading failures, serverless cold starts that look like errors, AI/LLM API errors with 'sometimes works on retry' patterns, and the rate-limit-cascade anti-pattern.

The prompt

Promptolis Original · Copy-ready
<role> You are an error-handling architect with 7+ years designing resilient systems for distributed services, queue workers, frontend apps, and AI/LLM integrations. You have audited 50+ codebases for error-handling pathologies. You know which patterns prevent silent failures vs which add complexity without value. You are direct. You will tell a builder their try-catch swallowing is the bug, that retrying validation errors is amplifying load, or that their Sentry inbox is unactioned because the alerts aren't actionable. You refuse to recommend 'log more' as a generic answer — log the right things, not everything. </role> <principles> 1. Error classes hierarchy. instanceof checks beat string matching. 2. Three classes: transient, permanent, business-logic. Different handling. 3. Retry only transient. Retrying permanent = rate-limit cascade. 4. Exponential backoff with jitter. Base 100ms, max 30s. 5. Circuit breaker after 5 consecutive failures. 60s break. 6. Customer messages: no stack traces, no internal IDs except correlation ID. 7. Structured logging. Not string interpolation. </principles> <input> <service-type>{API service / queue worker / frontend / full-stack / AI integration / 'recommend'}</service-type> <stack>{language, framework}</stack> <external-dependencies>{services you call: Stripe, Twilio, LLM APIs, internal services, etc.}</external-dependencies> <current-state>{nothing / try-catch-everywhere / partial / mature but messy}</current-state> <biggest-pain>{silent failures / Sentry noise / cascading retries / customer-facing 500s / specific failure patterns}</biggest-pain> <scale>{requests/sec, jobs/sec, users}</scale> <error-budget>{availability target if known: 99.9%, 99.99%, etc.}</error-budget> <observability>{Sentry / Datadog / custom / nothing}</observability> <team-size>{engineers}</team-size> </input> <output-format> # Error Handling Strategy: [service name] ## Diagnosis What's broken. The 1-2 highest-leverage fixes. ## Error Class Hierarchy The class structure: BaseError → RetriableError, PermanentError, BusinessLogicError, etc. Specific to your stack. ## Per-Class Handling Rules For each class: retry / log / alert / surface to user / fail-loudly. With reasoning. ## Retry Policy Backoff strategy, max attempts, jitter, circuit breaker thresholds. Specific values. ## Customer-Facing Error Messages The error envelope shape. Examples for common scenarios. What's logged vs shown. ## Logging Strategy Structured fields to include. What to log at info/warn/error. What to NEVER log (PII, secrets). ## Alert Strategy What triggers a page. What goes to Slack-only. What's daily-digest. Threshold tuning. ## External API Patterns For each external service: retry policy, fallback, circuit breaker, timeout. ## Dead Letter Queue (if applicable) For async jobs: when to DLQ, how to inspect, how to replay. ## Idempotency Keys For write operations that retry: idempotency key strategy. ## Implementation Skeleton File structure + key code patterns for the chosen stack. ## Migration from Current State If existing code: how to introduce structured handling without breaking everything. ## What This Strategy Won't Solve Honest limits. ## Maintenance Cadence Monthly + quarterly reviews. ## Key Takeaways 4-6 bullets — for the team's playbook. </output-format> <auto-intake> If input incomplete: ask for service type, stack, external dependencies, current state, biggest pain, scale, error budget, observability, team size. </auto-intake> Now, design the error-handling strategy:

Example: input → output

Here's how this prompt actually performs. Real input below, real output from Claude Opus 4.

📝 Input
<service-type>API service (REST + some background queue jobs)</service-type>
<stack>Node.js + TypeScript + Fastify + Postgres + Redis (BullMQ for queue)</stack>
<external-dependencies>Stripe, Twilio (SMS), OpenAI (we call GPT-4 for content generation), our internal user-service, Postmark (email)</external-dependencies>
<current-state>Try-catch-everywhere. Most catches log to Sentry + return 500. Some have retry but no backoff. Some have retry with no max attempts (caught one infinite retry loop last month).</current-state>
<biggest-pain>(1) Sentry has 12K open errors. Most are noise (e.g., user disconnects mid-request → 500 logged). (2) When OpenAI rate-limits us, our retry hammers them, makes it worse. (3) Our Twilio SMS sometimes fails permanently (invalid number) but we retry forever. (4) Customer-facing 500s show stack traces in dev mode that occasionally leak to prod.</biggest-pain>
<scale>~200 req/sec average, peak ~800 req/sec. ~10K background jobs/day.</scale>
<error-budget>99.9% uptime SLA</error-budget>
<observability>Sentry for errors, Datadog for metrics + APM, structured logging via Pino</observability>
<team-size>4 backend engineers</team-size>
🤖 Output

Error Handling Strategy: Node API + BullMQ Service

Diagnosis

Three concrete issues making your Sentry noise + production cascades:

1. No error classification → every error treated the same → retry-everything anti-pattern → 12K Sentry items

2. OpenAI rate-limit cascade → retry with no backoff → makes throttling worse → bills increase + service degrades

3. Twilio infinite retry on permanent errors → invalid phone numbers retry forever → you discovered this only after one infinite loop

Highest-leverage fixes (in order):

1. Build the error class hierarchy. ~2 days. Eliminates string-match error handling. Foundation for everything else.

2. Implement classified retry policy. Retry only `RetriableError`. Permanent errors fail fast. Cuts Sentry noise 60%+ AND prevents the cascades.

3. Add circuit breakers on external APIs. Stops the OpenAI/Twilio cascades.

4. Tune Sentry: filter user-disconnects + bot traffic. Cuts another 30% of noise.

Estimated outcome: Sentry inbox drops from 12K to <500 actionable items. Cascades to OpenAI/Twilio stop. Real customer-impacting bugs become visible because they're not buried.

Error Class Hierarchy

// /lib/errors/index.ts

export class BaseError extends Error {
  public readonly correlationId: string;
  public readonly timestamp: Date;
  public readonly context: Record<string, unknown>;
  
  constructor(message: string, context: Record<string, unknown> = {}) {
    super(message);
    this.name = this.constructor.name;
    this.correlationId = generateCorrelationId();
    this.timestamp = new Date();
    this.context = context;
    Error.captureStackTrace?.(this, this.constructor);
  }
}

// Transient: retry might succeed (network, rate limits, transient DB errors)
export class RetriableError extends BaseError {
  public readonly retryAfter?: number; // ms
}
export class NetworkError extends RetriableError {}
export class RateLimitError extends RetriableError {}
export class TransientDatabaseError extends RetriableError {}
export class ExternalServiceUnavailable extends RetriableError {}

// Permanent: retry will not succeed
export class PermanentError extends BaseError {}
export class ValidationError extends PermanentError { /* 400 */ }
export class NotFoundError extends PermanentError { /* 404 */ }
export class UnauthorizedError extends PermanentError { /* 401 */ }
export class ForbiddenError extends PermanentError { /* 403 */ }
export class ConflictError extends PermanentError { /* 409 */ }

// Business logic: user violated a rule (often expected)
export class BusinessLogicError extends BaseError {
  public readonly userMessage: string; // safe to show user
}
export class InsufficientFundsError extends BusinessLogicError {}
export class QuotaExceededError extends BusinessLogicError {}
export class FeatureNotAvailableError extends BusinessLogicError {}

// Critical: needs immediate human attention
export class CriticalError extends BaseError {} 
export class DataIntegrityError extends CriticalError {}
export class SecurityError extends CriticalError {}

Why this hierarchy:

  • instanceof RetriableError is your retry-or-not decision
  • instanceof BusinessLogicError is your user-message-safe decision
  • instanceof CriticalError is your page-on-call decision

Per-Class Handling Rules

Error ClassRetry?Log LevelAlert?User Sees?
RetriableErrorYes (with backoff)warnOnly if recovery failsAfter max retries: 500
NetworkErrorYes (3 attempts)warnIf sustainedAfter max retries: 503
RateLimitErrorYes (after retryAfter)infoTrack rate, alert if >5%/min429 with Retry-After
ValidationErrorNoinfoNo (expected)400 with details
NotFoundErrorNoinfoNo404
UnauthorizedErrorNoinfoIf volume spikes401
BusinessLogicErrorNoinfoNo422 with userMessage
QuotaExceededErrorNoinfoIf conversion-blocking429 with upgrade prompt
CriticalErrorNoerrorYES, page on-call500 with correlation ID
DataIntegrityErrorNoerrorYES, page immediately500
SecurityErrorNoerrorYES, page security on-call401/403 generic

Retry Policy

For RetriableError:

// /lib/retry.ts
export async function withRetry<T>(
  fn: () => Promise<T>,
  opts: {
    maxAttempts?: number;       // default 3
    baseDelayMs?: number;       // default 100
    maxDelayMs?: number;        // default 30000
    jitterFactor?: number;      // default 0.25
    timeoutMs?: number;         // default 30000
    circuitBreakerKey?: string; // for cross-call circuit breaker
  } = {}
): Promise<T> {
  const { maxAttempts = 3, baseDelayMs = 100, maxDelayMs = 30000, jitterFactor = 0.25 } = opts;
  
  // Check circuit breaker
  if (opts.circuitBreakerKey && circuitBreaker.isOpen(opts.circuitBreakerKey)) {
    throw new ExternalServiceUnavailable('Circuit breaker open');
  }
  
  let lastError: Error | undefined;
  
  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    try {
      const result = await withTimeout(fn(), opts.timeoutMs ?? 30000);
      circuitBreaker.onSuccess(opts.circuitBreakerKey);
      return result;
    } catch (e) {
      lastError = e as Error;
      
      // Don't retry permanent errors
      if (!(e instanceof RetriableError)) throw e;
      
      // Track for circuit breaker
      circuitBreaker.onFailure(opts.circuitBreakerKey);
      
      // Last attempt: throw
      if (attempt === maxAttempts) throw e;
      
      // Calculate delay with jitter
      const delayMs = Math.min(baseDelayMs * Math.pow(2, attempt - 1), maxDelayMs);
      const jitter = delayMs * jitterFactor * (Math.random() * 2 - 1);
      const finalDelay = (e as RetriableError).retryAfter ?? Math.max(0, delayMs + jitter);
      
      logger.warn('retry.attempt', {
        error: e.message,
        attempt,
        nextDelayMs: finalDelay,
        correlationId: (e as BaseError).correlationId,
      });
      
      await sleep(finalDelay);
    }
  }
  
  throw lastError;
}

Defaults:

  • Max attempts: 3 (most cases). 5 for critical operations like payment.
  • Base delay: 100ms
  • Max delay: 30s
  • Jitter: ±25%
  • Timeout per attempt: 30s

Circuit breaker:

  • Opens after 5 consecutive failures within 60s
  • Stays open for 60s
  • Half-open: 1 test request after 60s; success closes, failure re-opens
  • Tracked per circuitBreakerKey (e.g., 'openai-api', 'twilio-sms', 'internal-user-service')

Customer-Facing Error Messages

Response envelope:

// Error response shape (HTTP)
{
  "error": {
    "code": "validation_failed",      // machine-readable
    "message": "Email is required",   // user-friendly (BusinessLogicError + ValidationError only)
    "correlationId": "err_abc123xyz", // for support requests
    "details": {                       // optional, for ValidationError
      "field": "email",
      "constraint": "required"
    }
  }
}

Error code → HTTP status mapping:

Error CodeHTTPExample
validation_failed400bad input
unauthorized401not logged in
forbidden403logged in but not allowed
not_found404resource doesn't exist
conflict409duplicate/state conflict
business_rule_violation422quota exceeded, plan upgrade needed
rate_limited429too many requests + Retry-After header
internal_error500RetriableError exhausted retries
service_unavailable503circuit breaker open

For non-business errors: show ONLY the correlation ID. NEVER stack trace, NEVER internal field names.

// Wrong
{ error: { message: 'PostgresError: column "email" does not exist' } }

// Right
{ error: { code: 'internal_error', message: 'Something went wrong. Please try again or contact support with reference err_abc123xyz.', correlationId: 'err_abc123xyz' } }

Logging Strategy

Structured logging via Pino. Every log entry includes:

{
  timestamp: '2026-04-28T14:30:00Z',
  level: 'error',
  service: 'api',
  correlationId: 'err_abc123',
  userId: 'user_123' | null,    // never PII like email here, just ID
  requestId: 'req_xyz789',
  errorClass: 'RateLimitError',
  errorMessage: '...',
  context: { /* class-specific context */ }
}

Log levels:

  • error: CriticalError, unexpected non-classified throws
  • warn: RetriableError attempts, retry failures, circuit breaker state changes
  • info: ValidationError, BusinessLogicError, expected operational events
  • debug: not in production by default

Never log:

  • Passwords, API keys, tokens
  • Full credit card numbers (last 4 only)
  • Email addresses (use user_id instead)
  • PII unless legally required and audited

Always log:

  • Correlation ID (for tracing)
  • User ID (when applicable)
  • Request ID (for tracing across logs)
  • Error class name (so log queries can filter)

Alert Strategy

Page on-call (PagerDuty):

  • Any CriticalError
  • Error rate >2% sustained for 5 min
  • p99 latency >5s sustained for 5 min
  • Circuit breaker open for >2 min on critical services
  • Database connection pool exhausted

Slack #alerts (no page):

  • Error rate >0.5% sustained for 5 min
  • New error type appearing (haven't seen this BaseError subclass before)
  • Circuit breaker opens (any service)
  • DLQ depth >100 jobs

Daily digest (Slack):

  • Top 5 error classes by volume
  • Top 3 endpoints by error rate
  • Sentry trends week-over-week

Sentry tuning to drop noise:

# Sentry config — ignore these noisy patterns
ignoreErrors: [
  /^AbortError/,                // user disconnect
  /^ECONNRESET/,                 // network blip
  /^Request failed.*timeout/,    // already retried, don't double-log
  /^ValidationError.*email/,     // expected user input errors
]

beforeSend(event) {
  // Don't send 4xx errors to Sentry — they're not bugs
  if (event.tags?.statusCode?.startsWith('4')) return null;
  // Drop bot traffic
  if (isBotUserAgent(event.request?.headers?.['user-agent'])) return null;
  return event;
}

External API Patterns

OpenAI
async function callOpenAI<T>(prompt: string): Promise<T> {
  return withRetry(
    async () => {
      try {
        return await openai.chat.completions.create({ /* ... */ });
      } catch (e: any) {
        if (e.status === 429) {
          throw new RateLimitError('OpenAI rate limit', { retryAfter: parseRetryAfter(e) });
        }
        if (e.status >= 500) {
          throw new ExternalServiceUnavailable('OpenAI 5xx', { originalError: e });
        }
        if (e.status === 400) {
          throw new ValidationError('OpenAI bad request', { originalError: e });
        }
        throw e;
      }
    },
    { maxAttempts: 3, baseDelayMs: 1000, circuitBreakerKey: 'openai-api' }
  );
}
Twilio SMS
async function sendSMS(to: string, body: string) {
  return withRetry(
    async () => {
      try {
        return await twilio.messages.create({ to, body });
      } catch (e: any) {
        // Twilio error codes — see https://www.twilio.com/docs/api/errors
        if (e.code === 21211 || e.code === 21408) {
          throw new ValidationError('Invalid phone number', { code: e.code });
        }
        if (e.code === 20429) {
          throw new RateLimitError('Twilio rate limit');
        }
        if (e.status >= 500) {
          throw new ExternalServiceUnavailable('Twilio 5xx');
        }
        throw e;
      }
    },
    { maxAttempts: 3, circuitBreakerKey: 'twilio-sms' }
  );
}

Critical: Twilio's invalid-phone errors (21211, 21408) are PERMANENT. Throwing them as ValidationError stops the infinite retry you experienced.

Stripe (idempotency required)
async function createPayment(amount: number, customerId: string, idempotencyKey: string) {
  return withRetry(
    async () => {
      try {
        return await stripe.paymentIntents.create({ /* ... */ }, { idempotencyKey });
      } catch (e: any) {
        if (e.type === 'StripeRateLimitError') throw new RateLimitError('Stripe rate limit');
        if (e.type === 'StripeConnectionError') throw new NetworkError('Stripe connection');
        if (e.type === 'StripeCardError') throw new BusinessLogicError(e.message, { userMessage: e.message });
        throw e;
      }
    },
    { maxAttempts: 5, baseDelayMs: 200, circuitBreakerKey: 'stripe-api' }
  );
}

Idempotency key: SHA256(userId + orderId + timestamp_minute) — same key within a minute = Stripe deduplicates. Critical for payment retries.

Dead Letter Queue (BullMQ)

// BullMQ job processor
worker.on('failed', async (job, err) => {
  if (job.attemptsMade >= job.opts.attempts) {
    // Move to DLQ
    await deadLetterQueue.add('failed-job', {
      jobName: job.name,
      data: job.data,
      lastError: serializeError(err),
      attempts: job.attemptsMade,
      failedAt: new Date(),
    });
    
    if (err instanceof CriticalError) {
      // Page on-call
      await pagerduty.alert(...)
    }
  }
});

DLQ inspection tool (admin endpoint):

  • List failed jobs by date / job-type / error class
  • View job data
  • Re-enqueue specific job
  • Bulk re-enqueue (after fixing root cause)
  • Delete (after confirming not needed)

Idempotency Keys

For any write operation that might retry:

// Store idempotency keys in Redis with 24h TTL
// Key: hash(operation_type + user_id + business_key)
// Value: { result: ..., cachedAt: Date }

async function withIdempotency<T>(
  key: string,
  fn: () => Promise<T>
): Promise<T> {
  const cached = await redis.get(`idem:${key}`);
  if (cached) {
    return JSON.parse(cached).result;
  }
  
  const result = await fn();
  
  await redis.setEx(`idem:${key}`, 86400, JSON.stringify({ result, cachedAt: new Date() }));
  
  return result;
}

Use for: payments, email sends, external API calls that mutate state, webhook processing.

Implementation Skeleton

/lib/errors/
  index.ts           (BaseError + class hierarchy)
  middleware.ts      (Fastify error handler)
  serialize.ts       (error → response envelope)

/lib/retry/
  with-retry.ts      (retry wrapper)
  circuit-breaker.ts (circuit breaker state machine)
  with-timeout.ts    (timeout wrapper)

/lib/idempotency/
  with-idempotency.ts

/lib/external/
  openai.ts          (typed OpenAI calls with error mapping)
  twilio.ts
  stripe.ts
  postmark.ts
  internal-user-service.ts

/workers/
  base-worker.ts     (BullMQ worker with error handling baked in)
  dlq-replay.ts      (admin tool)

/observability/
  sentry-config.ts   (filter rules)
  logger.ts          (Pino setup)
  metrics.ts         (Datadog StatsD client)

Fastify error handler middleware:

app.setErrorHandler((error, request, reply) => {
  const correlationId = (error as any).correlationId ?? generateCorrelationId();
  
  // Log
  if (error instanceof CriticalError) {
    logger.error({ error, correlationId, request: requestContext(request) });
  } else if (error instanceof RetriableError || error instanceof PermanentError) {
    logger.warn({ error, correlationId });
  } else {
    // Unexpected error — log as error + alert
    logger.error({ error, correlationId, type: 'unexpected' });
  }
  
  // Map to HTTP
  const { statusCode, body } = mapErrorToResponse(error, correlationId);
  reply.code(statusCode).send(body);
});

Migration from Current State

Week 1: Foundation
  • Build error class hierarchy
  • Build withRetry + circuit breaker
  • Add Fastify error middleware
  • Update Sentry config to filter known noise
Week 2: Replace high-traffic paths
  • Refactor OpenAI call sites to use new pattern (highest cascade risk)
  • Refactor Twilio (stops infinite retries)
  • Refactor Stripe with idempotency
Week 3: Replace remaining external calls
  • Postmark, internal-user-service, etc.
  • Update queue workers to use base-worker pattern with DLQ
Week 4: Tune
  • Review Sentry: closed/ignore noise issues
  • Verify alert thresholds match team capacity
  • Document patterns for new code
Ongoing
  • New code uses the patterns by default
  • Old try-catches replaced opportunistically
  • Audit quarterly

What This Strategy Won't Solve

  • Won't fix poor input validation upstream. ValidationErrors are the right output of bad input; preventing bad input is a frontend/API design issue.
  • Won't catch logical bugs. A function that returns wrong-but-valid data isn't an error from this framework's perspective. Tests catch logic bugs.
  • Won't eliminate Sentry items entirely. New error classes will appear; the goal is signal-to-noise, not zero.
  • Won't compensate for an undersized infrastructure. If your DB is sized too small, you'll see RetriableError storms during traffic spikes. The error handling makes failures graceful but doesn't prevent them.
  • Won't replace runbooks for incident response. When circuit breakers open, on-call still needs to know what to do.

Maintenance Cadence

Per-PR (engineer):

  • New external API call sites use withRetry + appropriate circuit breaker key
  • New error throws use specific error classes, not generic Error

Weekly:

  • Review Sentry top errors. New patterns? Drop noise rules where possible.
  • Check circuit breaker state in Datadog dashboard. Anything stuck open?

Monthly:

  • Audit alert tuning: any pages that aren't actionable? Adjust thresholds.
  • DLQ review: any patterns? Root-cause + replay or delete.
  • Error budget burn rate: are we trending toward SLA breach?

Quarterly:

  • Add new error classes if patterns emerge
  • Review external service tooling: better way to call OpenAI / Twilio?
  • Postmortem on any production incident that escaped the framework

Key Takeaways

  • Three error classes (Retriable, Permanent, BusinessLogic) drive all decisions. Hierarchy is the foundation.
  • Retry only Retriable. Twilio's invalid-number is Permanent — that's why your infinite retry happened.
  • Circuit breaker on external APIs prevents cascades. OpenAI rate-limit storms stop being self-amplifying.
  • Customer-facing errors: code + safe message + correlation ID. Never stack traces.
  • Sentry is for actionable bugs. Filter user-disconnects + ValidationErrors. Tune to <500 actionable items, not 12K noise.
  • Migration is 4 weeks. Most pain disappears in week 2 after OpenAI + Twilio refactor.

Common use cases

  • Engineer building a new API service + wanting error handling designed upfront
  • Tech lead auditing existing error handling in a codebase that has 'silent failures' problem
  • Solo dev whose Sentry inbox has 5K errors + most are noise, real bugs are buried
  • Backend lead designing error contracts for a multi-service architecture
  • Engineer integrating with flaky 3rd-party APIs (Stripe, Twilio, AI APIs) + needs retry strategy
  • Team migrating from 'try-catch-everywhere' to structured error handling

Best AI model for this

Claude Opus 4. Error-handling design needs reasoning about failure modes, blast radius, and operational tradeoffs — exactly Claude's strengths. ChatGPT GPT-5 second-best.

Pro tips

  • Error classes hierarchy beats string-matching error messages. `RetriableError` extends `BaseError` lets you `instanceof` check.
  • Distinguish transient (network blip), permanent (validation failed), business-logic (user violated rule). Each has different handling.
  • Retry only transient errors. Retrying validation errors is the rate-limit-cascade pattern.
  • Exponential backoff with jitter. Not retry-immediately. Industry default: base 100ms, max 30s, jitter ±25%.
  • Circuit breaker after 5 consecutive failures. Stop retrying for 60s. Prevents thundering-herd retries.
  • Customer-facing error messages should never show stack traces or internal IDs. 'Something went wrong, error ref: abc123' is the right pattern.
  • Log structurally, not stringly. `logger.error({ user_id, error_type, ... })` beats `logger.error('User 123 had error: ...')`.

Customization tips

  • Be specific about your stack. Error patterns in Node.js + Fastify differ from Rails or Django; the implementation skeleton calibrates to yours.
  • List ALL external dependencies with details. Stripe error patterns differ from Twilio differ from OpenAI; each needs class-specific mapping.
  • Specify your biggest pain concretely. 'Sentry noise' vs 'cascading retries' vs 'silent failures' need different first-step fixes.
  • Mention your error budget / SLA. Retry policies and circuit breaker thresholds calibrate to availability targets.
  • Be honest about scale. 800 req/sec needs different patterns than 80 req/sec — circuit breakers matter more at higher scale.
  • Use the AI/LLM API Mode variant if you're heavily integrating with LLMs — adds the unique LLM error patterns (rate limits, context overflow, hallucination as soft failure).

Variants

API Service Mode

For HTTP/RPC services — emphasizes status code mapping, error envelopes, idempotency.

Background Job Mode

For queue workers + async jobs — emphasizes retry policies, dead-letter queues, partial failure.

Frontend Mode

For React/Vue/Svelte apps — emphasizes UI error states, error boundaries, retry UX.

AI/LLM API Mode

For services calling LLMs — emphasizes the unique LLM error patterns (rate limits, context overflow, model unavailable, hallucination as 'soft failure').

Frequently asked questions

How do I use the Error Handling Strategy prompt?

Open the prompt page, click 'Copy prompt', paste it into ChatGPT, Claude, or Gemini, and replace the placeholders in curly braces with your real input. The prompt is also launchable directly in each model with one click.

Which AI model works best with Error Handling Strategy?

Claude Opus 4. Error-handling design needs reasoning about failure modes, blast radius, and operational tradeoffs — exactly Claude's strengths. ChatGPT GPT-5 second-best.

Can I customize the Error Handling Strategy prompt for my use case?

Yes — every Promptolis Original is designed to be customized. Key levers: Error classes hierarchy beats string-matching error messages. `RetriableError` extends `BaseError` lets you `instanceof` check.; Distinguish transient (network blip), permanent (validation failed), business-logic (user violated rule). Each has different handling.

Explore more Originals

Hand-crafted 2026-grade prompts that actually change how you work.

← All Promptolis Originals