⚡ Promptolis Original · Coding & Development

⚡ Caching Strategy Architect

Q: How do I use the Caching Strategy Architect prompt?

Open the prompt page, click 'Copy prompt', paste it into ChatGPT, Claude, or Gemini, and replace the placeholders in curly braces with your real input. The prompt is also launchable directly in each model with one click.

Q: Can I customize the Caching Strategy Architect prompt for my use case?

Yes — every Promptolis Original is designed to be customized. Key levers: Don't cache by default. Cache the slow + hot. Cold data + fast queries don't benefit.; Stale-while-revalidate beats hard expiry. Serve stale data while fetching fresh; user never sees lag.

Designs your caching: which layer (CDN, app-cache, Redis, DB), what TTL, what invalidation strategy — without the 'cache and pray' bugs that make cached data silently wrong.

⏱️ 5 min to set up 🤖 ~110 seconds in Claude 🗓️ Updated 2026-04-28

Why this is epic

Caching is one of the two hard problems in computer science (along with naming). Most teams cache aggressively then fight invalidation bugs for years. This Original picks the right layer for each data type + designs the invalidation strategy upfront.

Outputs the complete strategy: cache layer per data type (CDN for static, Redis for session, app-memory for hot config, DB query cache for slow reports), TTL recommendations, invalidation triggers, and the specific anti-patterns to avoid.

Includes the 6 cache failure modes: stampede, thundering herd, stale-while-revalidate gone wrong, cache key collision, partition skew, and hot-key. Each has a specific structural fix.

Calibrated to 2026 caching reality: edge caching (Cloudflare Workers KV, Vercel), Redis Cluster, app-level LRU, database query caching, CDN at scale. Picks the right tool per data shape.

The prompt

Promptolis Original · Copy-ready

<role> You are a caching architect with 7+ years designing cache layers for SaaS apps, content sites, and high-traffic APIs. You have shipped caching strategies for 30+ teams. You know exactly which layer fits which data and where stampedes + invalidation bugs hide. You are direct. You will tell a builder their TTL is wrong, that they need stale-while-revalidate not hard expiry, or that their cache stampede is making things slower than no caching. You refuse to recommend 'add Redis' as a generic answer — different data types need different layers. </role> <principles> 1. Cache the slow + hot. Skip cold data + fast queries. 2. Stale-while-revalidate > hard expiry. 3. Invalidation = write pattern. Atomic write+invalidate. 4. TTL based on freshness needs, not 'still valid.' 5. Singleton-fetch on cache miss. Stops stampede. 6. Hot keys destroy Redis. Distribute or replicate. 7. App-memory > Redis for tiny hot data. </principles> <input> <service-context>{what's slow / what needs caching}</service-context> <data-types>{the categories of data — user profiles, product listings, analytics, etc.}</data-types> <read-write-ratio>{rough % reads vs writes per data type}</read-write-ratio> <staleness-tolerance>{how stale is OK per data type — instant / seconds / minutes / hours}</staleness-tolerance> <scale>{requests/sec, data volume}</scale> <existing-caching>{nothing / partial / mature with bugs}</existing-caching> <infrastructure>{available: Redis, CDN, edge workers, in-memory limits}</infrastructure> <observed-bugs>{stampede, stale data, hot keys, etc.}</observed-bugs> </input> <output-format> # Caching Strategy: [service] ## Suitability Check What to cache, what NOT to cache. Honest. ## Per-Data-Type Layer Mapping For each data type: which cache layer (CDN / Redis / app-memory / DB query cache / none), why this layer, TTL. ## Invalidation Strategy For each cached data: how invalidation works on write. Atomic patterns. ## Cache Key Strategy Naming convention. Versioning. Avoiding collisions. ## Cache Stampede Prevention The singleton-fetch pattern. Implementation. ## Hot Key Mitigation If any data has hot keys: detection + mitigation. ## Cache Warming Which caches to pre-warm + how. NOT cold-cache failure modes. ## Observability Metrics: hit rate, miss rate, eviction rate, stampede events. Per layer. ## Implementation Skeleton File structure + key code patterns for the chosen stack. ## What This Strategy Won't Solve Honest limits. ## Migration from Current State If existing caching: how to introduce structured layer without breaking existing. ## Maintenance Cadence Monthly + quarterly reviews. ## Key Takeaways 4-6 bullets — for the team's caching playbook. </output-format> <auto-intake> If input incomplete: ask for service context, data types, read-write ratio, staleness tolerance, scale, existing caching, infrastructure, observed bugs. </auto-intake> Now, design the caching strategy:

0 copies

🚀 Open in ChatGPT ✨ Open in Claude 💎 Open in Gemini

Example: input → output

Here's how this prompt actually performs. Real input below, real output from Claude Opus 4.

📝 Input

<service-context>B2B SaaS for project management. Frontend is Next.js 14, backend is tRPC API. We have 5K paying teams, ~80K active users. Some endpoints are slow:
  - /api/dashboard (loads project list + recent activity + member presence — takes 1.2s p95)
  - /api/project/{id}/timeline (loads full project history — takes 2-3s p95)
  - /api/integrations/list (dropdown of available integrations — fast but called 10x per page load)</service-context>
<data-types>(1) User profiles + auth state, (2) Project lists per user, (3) Project timeline data, (4) Integrations directory (rarely changes), (5) Real-time presence (who's online), (6) Search results, (7) Static content (help articles, blog).</data-types>
<read-write-ratio>User profiles: 99/1. Projects: 95/5. Timeline: 99/1. Integrations: 99.99/0.01. Presence: 50/50 (writes are heartbeats). Search: 100/0. Static: 99.99/0.01.</read-write-ratio>
<staleness-tolerance>User profiles: 5 min. Projects: 1 min. Timeline: 5 min. Integrations: 1 hour. Presence: 5 sec. Search: 1 hour. Static: 24 hours.</staleness-tolerance>
<scale>~200 req/sec average API. ~800 peak. ~10K simultaneous users at peak.</scale>
<existing-caching>Nothing currently. We've been told 'just add Redis' for 6 months but no one designed it.</existing-caching>
<infrastructure>Available: Redis (Upstash). Cloudflare CDN (in front of Vercel). Vercel edge functions. ~256MB app memory per Lambda function.</infrastructure>
<observed-bugs>None caching-specific yet (since no caching). Performance bugs: dashboard loads slow, timeline data slow.</observed-bugs>

🤖 Output

Caching Strategy: B2B Project Management SaaS

Suitability Check

Cache the slow + hot: dashboard data, timeline data, integrations directory, search results, static content.

Don't cache (skip the cache for these):

Real-time presence — already 5-second staleness, just hit Redis directly as a presence store, not a cache
Search results — your scale doesn't justify; native DB indexes should handle
Anything 1-time queried (one-off admin operations)

Highest-leverage: caching the dashboard + timeline endpoints alone will cut p95 latency 50-70%.

Per-Data-Type Layer Mapping

Data Type	Cache Layer	TTL	Why
User profiles + auth	App-memory LRU + Redis fallback	5 min app, 15 min Redis	Hot data, accessed every request, tiny per-user (~1KB), 99/1 read/write
Project list per user	Redis	60s	Per-user data, moderate size, 1-min staleness OK
Timeline data per project	Redis	5 min	Larger per-project (5-50KB), 99/1 read/write, 5-min staleness OK
Integrations directory	App-memory + Redis	1 hour app, 24 hour Redis	Rarely changes, tiny (~5KB total), every page load
Real-time presence	Redis directly (NOT a cache)	Sliding 30s	This isn't cached data; it's authoritative data IN Redis. Don't treat as cache.
Search results	NONE — use DB indexes	n/a	200 req/sec doesn't justify cache complexity
Static content (help, blog)	Cloudflare CDN	1 hour edge, 24 hour origin	Public, large, heavy traffic
Frontend assets (JS, CSS, images)	Cloudflare CDN	1 year (immutable URLs)	Standard frontend caching

Layer rationale

App-memory LRU (in Node process):

Best for tiny, hot, semi-static data: integrations directory, current user's profile during their session
LRU eviction at 200 entries (avoids unbounded growth)
Per-Lambda; doesn't share across instances (acceptable — 5K users / 4 instances = ~1.2K cache entries each, low hit-rate but cheap at this scale)
Library: lru-cache npm package

Redis (Upstash):

Best for shared per-user data, session-like data, larger-than-app-memory items
TTL-based expiry
Pub/sub for invalidation (if needed)

Cloudflare CDN (edge):

Best for public static content + GET-only data
Set Cache-Control headers; CF respects

Invalidation Strategy

User profile (Redis cache)

// Write pattern
async function updateUserProfile(userId: string, updates: ProfileUpdates) {
  // Atomic: write to DB + invalidate cache in transaction
  await db.transaction(async (tx) => {
    await tx.users.update({ id: userId }, updates);
    await redis.del(`user:${userId}`);  // invalidate Redis
  });
  // Invalidate app-memory caches via pub/sub
  await redis.publish('cache.invalidate.user', userId);
}

// Read pattern with stale-while-revalidate
async function getUserProfile(userId: string): Promise<Profile> {
  // Try app-memory first
  const memCached = appCache.get(`user:${userId}`);
  if (memCached) return memCached;
  
  // Try Redis
  const cached = await redis.get(`user:${userId}`);
  if (cached) {
    const profile = JSON.parse(cached);
    appCache.set(`user:${userId}`, profile, { ttl: 5 * 60 * 1000 });
    return profile;
  }
  
  // Fetch from DB (singleton-fetch — see Stampede section)
  const profile = await singletonFetch(`user:${userId}`, async () => {
    return await db.users.findById(userId);
  });
  
  // Populate caches
  await redis.set(`user:${userId}`, JSON.stringify(profile), { EX: 15 * 60 });
  appCache.set(`user:${userId}`, profile, { ttl: 5 * 60 * 1000 });
  return profile;
}

Project list (Redis cache)

// Write pattern
async function createProject(userId: string, projectData: ProjectCreate) {
  await db.transaction(async (tx) => {
    await tx.projects.create({ ...projectData, userId });
    await redis.del(`projects:${userId}`);  // invalidate user's project list
  });
}

async function deleteProject(userId: string, projectId: string) {
  await db.transaction(async (tx) => {
    await tx.projects.delete({ id: projectId });
    await redis.del(`projects:${userId}`);
    await redis.del(`timeline:${projectId}`);  // invalidate timeline too
  });
}

Integrations directory (rarely-invalidated)

Updated only when admin adds new integration
Manually trigger cache invalidation: redis.del('integrations:directory')
Or auto-invalidate via DB trigger on integrations table writes

Stale-while-revalidate pattern (advanced, for high-traffic)

async function getProjectsWithSWR(userId: string) {
  const key = `projects:${userId}`;
  const cached = await redis.get(key);
  
  if (cached) {
    const { data, fetchedAt } = JSON.parse(cached);
    const ageMs = Date.now() - fetchedAt;
    
    if (ageMs < 60 * 1000) {
      // Fresh — return immediately
      return data;
    } else {
      // Stale — return data + trigger background revalidation
      revalidateInBackground(userId);
      return data;
    }
  }
  
  // Cold cache
  const data = await fetchProjects(userId);
  await redis.set(key, JSON.stringify({ data, fetchedAt: Date.now() }), { EX: 5 * 60 });
  return data;
}

Benefit: user never waits for cache fetch. Cache stays "sufficiently fresh" via background revalidation.

Cache Key Strategy

Format: <namespace>:<id>:<version>

Examples:
  user:user_456:v2
  projects:user_456:v1
  timeline:proj_789:v1
  integrations:directory:v1

Version bump strategy:
  - Schema changes (e.g., add field to User) → bump v2 → v3
  - During migration: write to new key, eventual transition
  - Old keys naturally expire via TTL

Why versioned keys: when you change the cached object's shape, you don't have to invalidate the cache cluster manually. The new version uses a new key; old version expires naturally.

Avoid: non-prefixed keys (456), reused keys across services (456 collides between user-service and other-service).

Cache Stampede Prevention

// Singleton fetch: when N concurrent requests miss the same key,
// only ONE fetches; others wait for that result.

const inflight = new Map<string, Promise<any>>();

async function singletonFetch<T>(key: string, fetcher: () => Promise<T>): Promise<T> {
  if (inflight.has(key)) {
    return inflight.get(key)!;
  }
  
  const promise = (async () => {
    try {
      return await fetcher();
    } finally {
      inflight.delete(key);  // Cleanup whether success or fail
    }
  })();
  
  inflight.set(key, promise);
  return promise;
}

This is per-Lambda. Across instances, you'd use Redis-based locking (SETNX cache:lock:user_456 1 EX 30). But for your scale (Lambda, 4-8 instances), per-instance is sufficient — at most 8 stampede attempts simultaneously instead of 200.

For higher-scale fix: Redis-based singleton with SET NX + waiting clients poll briefly.

Hot Key Mitigation

Detection: Redis SLOWLOG + Datadog metrics show hot keys.

For your scale + data shapes, hot keys unlikely. The integrations:directory key would be hot if it weren't tiny + cached at app-memory layer too.

Mitigation if needed (future):

1. Replicate hot key across N variants (`integrations:directory:0`, `integrations:directory:1`, ...). Pick one randomly per request.

2. Move hot key to app-memory only (skip Redis).

3. Use Redis Cluster with consistent hashing if you outgrow single Redis.

Cache Warming

For scheduled rebuilds (e.g., after a deploy):

// Run on deploy or scheduled
async function warmIntegrationsCache() {
  const directory = await db.integrations.findAll();
  await redis.set('integrations:directory:v1', JSON.stringify(directory), { EX: 24 * 60 * 60 });
}

Warm: integrations directory after admin updates.

Don't warm: per-user data (you'd warm 80K user profiles after every deploy — expensive).

Cold cache failure modes: without warming, the first request after deploy is slow. Use stale-while-revalidate to mask this — old cached value (if Redis didn't restart) served, refresh in background.

Observability

Track per cache layer:

Hit rate: target >80% for hot data, >50% for medium-warm. Below 50% means cache isn't earning its keep.
Miss rate: every miss adds DB load. Watch trend.
Eviction rate: if Redis is evicting > 1% of entries before TTL, you need more memory or shorter retention.
Stampede events: when singleton-fetch is invoked. Track via metric.
Latency: cache fetch latency p95. Should be <5ms for Redis, <0.5ms for app-memory.

Datadog dashboard panel:

App-memory cache: hit rate, size, eviction count
Redis: hit rate per key namespace, evictions, memory used
CDN: hit rate (Cloudflare provides natively)

Alerts:

Hit rate drops >20% over 1 hour → likely a deployment introduced cache key change
Eviction rate >5% → memory under-provisioned
Redis memory >80% used → scale up

Implementation Skeleton

/lib/cache/
  app-memory.ts        (LRU cache wrapper)
  redis.ts             (Redis client + helpers)
  index.ts             (high-level get/set with all layers)
  invalidate.ts        (pub/sub-based cross-instance invalidation)
  singleton-fetch.ts   (stampede prevention)

/lib/cache/strategies/
  user-profile.ts      (per-data-type cache logic)
  projects.ts
  timeline.ts
  integrations.ts

/lib/cache/observability.ts (metric instrumentation)

Pattern: per-data-type cache modules.

// /lib/cache/strategies/user-profile.ts
import { appCache, redis, singletonFetch } from '../index';

export const UserProfileCache = {
  KEY_PREFIX: 'user',
  VERSION: 'v2',
  TTL_REDIS: 15 * 60, // 15 min
  TTL_MEMORY: 5 * 60 * 1000, // 5 min
  
  key(userId: string) {
    return `${this.KEY_PREFIX}:${userId}:${this.VERSION}`;
  },
  
  async get(userId: string): Promise<Profile | null> {
    const memCached = appCache.get(this.key(userId));
    if (memCached) {
      metrics.increment('cache.hit', { layer: 'memory', type: 'user' });
      return memCached;
    }
    
    const redisCached = await redis.get(this.key(userId));
    if (redisCached) {
      const profile = JSON.parse(redisCached);
      appCache.set(this.key(userId), profile, { ttl: this.TTL_MEMORY });
      metrics.increment('cache.hit', { layer: 'redis', type: 'user' });
      return profile;
    }
    
    metrics.increment('cache.miss', { type: 'user' });
    return null;
  },
  
  async set(userId: string, profile: Profile) {
    appCache.set(this.key(userId), profile, { ttl: this.TTL_MEMORY });
    await redis.set(this.key(userId), JSON.stringify(profile), { EX: this.TTL_REDIS });
  },
  
  async invalidate(userId: string) {
    appCache.delete(this.key(userId));
    await redis.del(this.key(userId));
    await redis.publish('cache.invalidate', JSON.stringify({ type: 'user', id: userId }));
  },
};

Use like:

async function getUser(userId: string) {
  const cached = await UserProfileCache.get(userId);
  if (cached) return cached;
  
  const profile = await singletonFetch(
    UserProfileCache.key(userId),
    () => db.users.findById(userId)
  );
  await UserProfileCache.set(userId, profile);
  return profile;
}

What This Strategy Won't Solve

Won't replace fast queries. If your DB query is 50ms, caching adds 5ms overhead — minimal benefit.
Won't handle truly real-time data. Presence at 5-second staleness should NOT be cached; treat Redis as the authoritative store.
Won't fix architectural slowness. If your dashboard does 25 sequential queries, caching helps but the real fix is parallelizing queries.
Won't compress beyond data shape. Caching 50-100KB timeline data is fine; caching 5MB is questionable.
Won't auto-handle TTL drift. If your invalidation logic has bugs, stale data will be served. Tests + monitoring required.

Migration from Current State

Week 1: Foundation

Add app-memory cache wrapper + LRU library
Set up Redis connection
Build the per-data-type cache modules (UserProfileCache, ProjectsCache, etc.)
Add metric instrumentation

Week 2: Cache the slow endpoints

Wrap getUserProfile with UserProfileCache
Wrap getProjectList with ProjectsCache
Wrap getProjectTimeline with TimelineCache
Test hit rates in staging

Week 3: Invalidation correctness

Wire up write-pattern invalidation for each cached data type
Test scenarios: update profile → cache invalidated → next read returns fresh
Spot-check cross-instance invalidation via pub/sub

Week 4: Optimize

CDN setup for static content (Cloudflare cache headers)
Stale-while-revalidate pattern for high-traffic endpoints
Cache warming for integrations directory

Week 5: Monitor + tune

Verify hit rates >80% on intended caches
Tune TTLs based on observed usage patterns
Address any cache bugs (stampede events, hot keys)

Maintenance Cadence

Weekly (eng team):

Review hit rate dashboard. Anything trending down?
Check Redis memory usage. Approaching limits?

Monthly:

Review TTL settings. Any data type where TTL is too long (causing stale bugs) or too short (causing high miss rate)?
Audit cache key versions. Any unused old versions?

Quarterly:

Cache strategy review: still hitting the right slow endpoints?
Cost review: Redis memory + CDN spend
Test invalidation correctness end-to-end with synthetic test

Key Takeaways

Cache the slow + hot. Dashboard, timeline, user profiles. Skip cold + fast queries.
3-layer strategy: app-memory (fastest, smallest), Redis (shared, medium), CDN (edge, public). Each data type fits one layer best.
Invalidation is a write pattern. Atomic write+invalidate prevents staleness windows.
Singleton-fetch on cache miss prevents stampede. Per-Lambda is sufficient at your scale.
Stale-while-revalidate for high-traffic endpoints. User never waits; background revalidation keeps data fresh.
Versioned cache keys enable schema evolution without manual cache flushes. New version = new key; old version expires naturally.

Common use cases

Engineer adding caching to a slow endpoint + worried about stale-data bugs
Tech lead designing caching for a new SaaS launching with growth ahead
DBA replacing expensive DB queries with cached results
Backend engineer migrating from no-cache to cached + needing invalidation strategy
DevOps consolidating fragmented cache layers (multiple Redis instances, ad-hoc app caches)
Engineer hitting cache stampede / thundering herd bugs in production

Best AI model for this

Claude Opus 4. Caching design needs reasoning about consistency vs performance tradeoffs, blast radius, and invalidation patterns — exactly Claude's strengths. ChatGPT GPT-5 second-best.

Pro tips

Don't cache by default. Cache the slow + hot. Cold data + fast queries don't benefit.
Stale-while-revalidate beats hard expiry. Serve stale data while fetching fresh; user never sees lag.
Cache invalidation is a write-pattern problem. If you write the source AND invalidate the cache atomically, no staleness windows.
TTL based on data freshness needs, not 'how long does it stay valid.' User profile: 5min staleness OK. Pricing: <30s.
Cache stampede happens when 100 requests hit a missed key simultaneously and all fetch the source. Use a singleton-fetch pattern.
Hot keys (1 key getting 50% of traffic) destroy Redis performance. Distribute via consistent hashing or replicate hot keys.
App-memory cache (Node Map, Python LRU) is faster than Redis for tiny hot data sets. Don't always reach for Redis.

Customization tips

List ALL data types your service handles, not just the slow ones. The strategy decides 'cache or not' per type — needs the full inventory.
Specify staleness tolerance per data type. 'OK if 5 minutes stale' vs 'must be real-time' fundamentally shapes the cache layer choice.
Be specific about scale. At 200 req/sec, app-memory caching is fine; at 20K req/sec, you need distributed Redis Cluster.
List your infrastructure. Redis vs no-Redis vs CDN vs edge workers determines what's available.
If you have observed cache bugs, describe specifically. Stampede patterns differ from hot-key patterns; the diagnosis affects the design.
Use the Solving Cache Bugs Mode variant if you have specific bugs (stale data, stampedes, hot keys) — different diagnostic patterns apply.

Variants

Read-Heavy API Mode

For API services with high read volume — emphasizes Redis caching + CDN at edge.

Database Query Cache Mode

For expensive DB queries — emphasizes materialized views, Redis-cached query results, and invalidation on write.

CDN-First Mode

For content sites + assets — emphasizes Cloudflare/Vercel edge caching, cache-control headers, purge strategies.

Solving Cache Bugs Mode

For teams hitting specific cache bugs (stampede, stale data, hot keys) — diagnoses + provides the structural fix.

Frequently asked questions

How do I use the Caching Strategy Architect prompt?

Open the prompt page, click 'Copy prompt', paste it into ChatGPT, Claude, or Gemini, and replace the placeholders in curly braces with your real input. The prompt is also launchable directly in each model with one click.

Which AI model works best with Caching Strategy Architect?

Claude Opus 4. Caching design needs reasoning about consistency vs performance tradeoffs, blast radius, and invalidation patterns — exactly Claude's strengths. ChatGPT GPT-5 second-best.

Can I customize the Caching Strategy Architect prompt for my use case?

Yes — every Promptolis Original is designed to be customized. Key levers: Don't cache by default. Cache the slow + hot. Cold data + fast queries don't benefit.; Stale-while-revalidate beats hard expiry. Serve stale data while fetching fresh; user never sees lag.

Explore more Originals

Hand-crafted 2026-grade prompts that actually change how you work.

← All Promptolis Originals

Curated by

Promptolis Editorial

Every Promptolis Original is hand-crafted and reviewed before publishing — built from scratch for 2026-grade LLMs.

Last reviewed on 2026-04-28 · About Promptolis