⚡ Promptolis Original · Coding & Development
⚡ Caching Strategy Architect
Designs your caching: which layer (CDN, app-cache, Redis, DB), what TTL, what invalidation strategy — without the 'cache and pray' bugs that make cached data silently wrong.
Why this is epic
Caching is one of the two hard problems in computer science (along with naming). Most teams cache aggressively then fight invalidation bugs for years. This Original picks the right layer for each data type + designs the invalidation strategy upfront.
Outputs the complete strategy: cache layer per data type (CDN for static, Redis for session, app-memory for hot config, DB query cache for slow reports), TTL recommendations, invalidation triggers, and the specific anti-patterns to avoid.
Includes the 6 cache failure modes: stampede, thundering herd, stale-while-revalidate gone wrong, cache key collision, partition skew, and hot-key. Each has a specific structural fix.
Calibrated to 2026 caching reality: edge caching (Cloudflare Workers KV, Vercel), Redis Cluster, app-level LRU, database query caching, CDN at scale. Picks the right tool per data shape.
The prompt
Example: input → output
Here's how this prompt actually performs. Real input below, real output from Claude Opus 4.
<service-context>B2B SaaS for project management. Frontend is Next.js 14, backend is tRPC API. We have 5K paying teams, ~80K active users. Some endpoints are slow:
- /api/dashboard (loads project list + recent activity + member presence — takes 1.2s p95)
- /api/project/{id}/timeline (loads full project history — takes 2-3s p95)
- /api/integrations/list (dropdown of available integrations — fast but called 10x per page load)</service-context>
<data-types>(1) User profiles + auth state, (2) Project lists per user, (3) Project timeline data, (4) Integrations directory (rarely changes), (5) Real-time presence (who's online), (6) Search results, (7) Static content (help articles, blog).</data-types>
<read-write-ratio>User profiles: 99/1. Projects: 95/5. Timeline: 99/1. Integrations: 99.99/0.01. Presence: 50/50 (writes are heartbeats). Search: 100/0. Static: 99.99/0.01.</read-write-ratio>
<staleness-tolerance>User profiles: 5 min. Projects: 1 min. Timeline: 5 min. Integrations: 1 hour. Presence: 5 sec. Search: 1 hour. Static: 24 hours.</staleness-tolerance>
<scale>~200 req/sec average API. ~800 peak. ~10K simultaneous users at peak.</scale>
<existing-caching>Nothing currently. We've been told 'just add Redis' for 6 months but no one designed it.</existing-caching>
<infrastructure>Available: Redis (Upstash). Cloudflare CDN (in front of Vercel). Vercel edge functions. ~256MB app memory per Lambda function.</infrastructure>
<observed-bugs>None caching-specific yet (since no caching). Performance bugs: dashboard loads slow, timeline data slow.</observed-bugs>
Caching Strategy: B2B Project Management SaaS
Suitability Check
Cache the slow + hot: dashboard data, timeline data, integrations directory, search results, static content.
Don't cache (skip the cache for these):
- Real-time presence — already 5-second staleness, just hit Redis directly as a presence store, not a cache
- Search results — your scale doesn't justify; native DB indexes should handle
- Anything 1-time queried (one-off admin operations)
Highest-leverage: caching the dashboard + timeline endpoints alone will cut p95 latency 50-70%.
Per-Data-Type Layer Mapping
| Data Type | Cache Layer | TTL | Why |
|---|---|---|---|
| User profiles + auth | App-memory LRU + Redis fallback | 5 min app, 15 min Redis | Hot data, accessed every request, tiny per-user (~1KB), 99/1 read/write |
| Project list per user | Redis | 60s | Per-user data, moderate size, 1-min staleness OK |
| Timeline data per project | Redis | 5 min | Larger per-project (5-50KB), 99/1 read/write, 5-min staleness OK |
| Integrations directory | App-memory + Redis | 1 hour app, 24 hour Redis | Rarely changes, tiny (~5KB total), every page load |
| Real-time presence | Redis directly (NOT a cache) | Sliding 30s | This isn't cached data; it's authoritative data IN Redis. Don't treat as cache. |
| Search results | NONE — use DB indexes | n/a | 200 req/sec doesn't justify cache complexity |
| Static content (help, blog) | Cloudflare CDN | 1 hour edge, 24 hour origin | Public, large, heavy traffic |
| Frontend assets (JS, CSS, images) | Cloudflare CDN | 1 year (immutable URLs) | Standard frontend caching |
Layer rationale
App-memory LRU (in Node process):
- Best for tiny, hot, semi-static data: integrations directory, current user's profile during their session
- LRU eviction at 200 entries (avoids unbounded growth)
- Per-Lambda; doesn't share across instances (acceptable — 5K users / 4 instances = ~1.2K cache entries each, low hit-rate but cheap at this scale)
- Library:
lru-cachenpm package
Redis (Upstash):
- Best for shared per-user data, session-like data, larger-than-app-memory items
- TTL-based expiry
- Pub/sub for invalidation (if needed)
Cloudflare CDN (edge):
- Best for public static content + GET-only data
- Set Cache-Control headers; CF respects
Invalidation Strategy
User profile (Redis cache)
// Write pattern
async function updateUserProfile(userId: string, updates: ProfileUpdates) {
// Atomic: write to DB + invalidate cache in transaction
await db.transaction(async (tx) => {
await tx.users.update({ id: userId }, updates);
await redis.del(`user:${userId}`); // invalidate Redis
});
// Invalidate app-memory caches via pub/sub
await redis.publish('cache.invalidate.user', userId);
}
// Read pattern with stale-while-revalidate
async function getUserProfile(userId: string): Promise<Profile> {
// Try app-memory first
const memCached = appCache.get(`user:${userId}`);
if (memCached) return memCached;
// Try Redis
const cached = await redis.get(`user:${userId}`);
if (cached) {
const profile = JSON.parse(cached);
appCache.set(`user:${userId}`, profile, { ttl: 5 * 60 * 1000 });
return profile;
}
// Fetch from DB (singleton-fetch — see Stampede section)
const profile = await singletonFetch(`user:${userId}`, async () => {
return await db.users.findById(userId);
});
// Populate caches
await redis.set(`user:${userId}`, JSON.stringify(profile), { EX: 15 * 60 });
appCache.set(`user:${userId}`, profile, { ttl: 5 * 60 * 1000 });
return profile;
}
Project list (Redis cache)
// Write pattern
async function createProject(userId: string, projectData: ProjectCreate) {
await db.transaction(async (tx) => {
await tx.projects.create({ ...projectData, userId });
await redis.del(`projects:${userId}`); // invalidate user's project list
});
}
async function deleteProject(userId: string, projectId: string) {
await db.transaction(async (tx) => {
await tx.projects.delete({ id: projectId });
await redis.del(`projects:${userId}`);
await redis.del(`timeline:${projectId}`); // invalidate timeline too
});
}
Integrations directory (rarely-invalidated)
- Updated only when admin adds new integration
- Manually trigger cache invalidation:
redis.del('integrations:directory') - Or auto-invalidate via DB trigger on
integrationstable writes
Stale-while-revalidate pattern (advanced, for high-traffic)
async function getProjectsWithSWR(userId: string) {
const key = `projects:${userId}`;
const cached = await redis.get(key);
if (cached) {
const { data, fetchedAt } = JSON.parse(cached);
const ageMs = Date.now() - fetchedAt;
if (ageMs < 60 * 1000) {
// Fresh — return immediately
return data;
} else {
// Stale — return data + trigger background revalidation
revalidateInBackground(userId);
return data;
}
}
// Cold cache
const data = await fetchProjects(userId);
await redis.set(key, JSON.stringify({ data, fetchedAt: Date.now() }), { EX: 5 * 60 });
return data;
}
Benefit: user never waits for cache fetch. Cache stays "sufficiently fresh" via background revalidation.
Cache Key Strategy
Format: <namespace>:<id>:<version>
Examples:
user:user_456:v2
projects:user_456:v1
timeline:proj_789:v1
integrations:directory:v1
Version bump strategy:
- Schema changes (e.g., add field to User) → bump v2 → v3
- During migration: write to new key, eventual transition
- Old keys naturally expire via TTL
Why versioned keys: when you change the cached object's shape, you don't have to invalidate the cache cluster manually. The new version uses a new key; old version expires naturally.
Avoid: non-prefixed keys (456), reused keys across services (456 collides between user-service and other-service).
Cache Stampede Prevention
// Singleton fetch: when N concurrent requests miss the same key,
// only ONE fetches; others wait for that result.
const inflight = new Map<string, Promise<any>>();
async function singletonFetch<T>(key: string, fetcher: () => Promise<T>): Promise<T> {
if (inflight.has(key)) {
return inflight.get(key)!;
}
const promise = (async () => {
try {
return await fetcher();
} finally {
inflight.delete(key); // Cleanup whether success or fail
}
})();
inflight.set(key, promise);
return promise;
}
This is per-Lambda. Across instances, you'd use Redis-based locking (SETNX cache:lock:user_456 1 EX 30). But for your scale (Lambda, 4-8 instances), per-instance is sufficient — at most 8 stampede attempts simultaneously instead of 200.
For higher-scale fix: Redis-based singleton with SET NX + waiting clients poll briefly.
Hot Key Mitigation
Detection: Redis SLOWLOG + Datadog metrics show hot keys.
For your scale + data shapes, hot keys unlikely. The integrations:directory key would be hot if it weren't tiny + cached at app-memory layer too.
Mitigation if needed (future):
1. Replicate hot key across N variants (`integrations:directory:0`, `integrations:directory:1`, ...). Pick one randomly per request.
2. Move hot key to app-memory only (skip Redis).
3. Use Redis Cluster with consistent hashing if you outgrow single Redis.
Cache Warming
For scheduled rebuilds (e.g., after a deploy):
// Run on deploy or scheduled
async function warmIntegrationsCache() {
const directory = await db.integrations.findAll();
await redis.set('integrations:directory:v1', JSON.stringify(directory), { EX: 24 * 60 * 60 });
}
Warm: integrations directory after admin updates.
Don't warm: per-user data (you'd warm 80K user profiles after every deploy — expensive).
Cold cache failure modes: without warming, the first request after deploy is slow. Use stale-while-revalidate to mask this — old cached value (if Redis didn't restart) served, refresh in background.
Observability
Track per cache layer:
- Hit rate: target >80% for hot data, >50% for medium-warm. Below 50% means cache isn't earning its keep.
- Miss rate: every miss adds DB load. Watch trend.
- Eviction rate: if Redis is evicting > 1% of entries before TTL, you need more memory or shorter retention.
- Stampede events: when singleton-fetch is invoked. Track via metric.
- Latency: cache fetch latency p95. Should be <5ms for Redis, <0.5ms for app-memory.
Datadog dashboard panel:
- App-memory cache: hit rate, size, eviction count
- Redis: hit rate per key namespace, evictions, memory used
- CDN: hit rate (Cloudflare provides natively)
Alerts:
- Hit rate drops >20% over 1 hour → likely a deployment introduced cache key change
- Eviction rate >5% → memory under-provisioned
- Redis memory >80% used → scale up
Implementation Skeleton
/lib/cache/
app-memory.ts (LRU cache wrapper)
redis.ts (Redis client + helpers)
index.ts (high-level get/set with all layers)
invalidate.ts (pub/sub-based cross-instance invalidation)
singleton-fetch.ts (stampede prevention)
/lib/cache/strategies/
user-profile.ts (per-data-type cache logic)
projects.ts
timeline.ts
integrations.ts
/lib/cache/observability.ts (metric instrumentation)
Pattern: per-data-type cache modules.
// /lib/cache/strategies/user-profile.ts
import { appCache, redis, singletonFetch } from '../index';
export const UserProfileCache = {
KEY_PREFIX: 'user',
VERSION: 'v2',
TTL_REDIS: 15 * 60, // 15 min
TTL_MEMORY: 5 * 60 * 1000, // 5 min
key(userId: string) {
return `${this.KEY_PREFIX}:${userId}:${this.VERSION}`;
},
async get(userId: string): Promise<Profile | null> {
const memCached = appCache.get(this.key(userId));
if (memCached) {
metrics.increment('cache.hit', { layer: 'memory', type: 'user' });
return memCached;
}
const redisCached = await redis.get(this.key(userId));
if (redisCached) {
const profile = JSON.parse(redisCached);
appCache.set(this.key(userId), profile, { ttl: this.TTL_MEMORY });
metrics.increment('cache.hit', { layer: 'redis', type: 'user' });
return profile;
}
metrics.increment('cache.miss', { type: 'user' });
return null;
},
async set(userId: string, profile: Profile) {
appCache.set(this.key(userId), profile, { ttl: this.TTL_MEMORY });
await redis.set(this.key(userId), JSON.stringify(profile), { EX: this.TTL_REDIS });
},
async invalidate(userId: string) {
appCache.delete(this.key(userId));
await redis.del(this.key(userId));
await redis.publish('cache.invalidate', JSON.stringify({ type: 'user', id: userId }));
},
};
Use like:
async function getUser(userId: string) {
const cached = await UserProfileCache.get(userId);
if (cached) return cached;
const profile = await singletonFetch(
UserProfileCache.key(userId),
() => db.users.findById(userId)
);
await UserProfileCache.set(userId, profile);
return profile;
}
What This Strategy Won't Solve
- Won't replace fast queries. If your DB query is 50ms, caching adds 5ms overhead — minimal benefit.
- Won't handle truly real-time data. Presence at 5-second staleness should NOT be cached; treat Redis as the authoritative store.
- Won't fix architectural slowness. If your dashboard does 25 sequential queries, caching helps but the real fix is parallelizing queries.
- Won't compress beyond data shape. Caching 50-100KB timeline data is fine; caching 5MB is questionable.
- Won't auto-handle TTL drift. If your invalidation logic has bugs, stale data will be served. Tests + monitoring required.
Migration from Current State
Week 1: Foundation
- Add app-memory cache wrapper + LRU library
- Set up Redis connection
- Build the per-data-type cache modules (UserProfileCache, ProjectsCache, etc.)
- Add metric instrumentation
Week 2: Cache the slow endpoints
- Wrap
getUserProfilewith UserProfileCache - Wrap
getProjectListwith ProjectsCache - Wrap
getProjectTimelinewith TimelineCache - Test hit rates in staging
Week 3: Invalidation correctness
- Wire up write-pattern invalidation for each cached data type
- Test scenarios: update profile → cache invalidated → next read returns fresh
- Spot-check cross-instance invalidation via pub/sub
Week 4: Optimize
- CDN setup for static content (Cloudflare cache headers)
- Stale-while-revalidate pattern for high-traffic endpoints
- Cache warming for integrations directory
Week 5: Monitor + tune
- Verify hit rates >80% on intended caches
- Tune TTLs based on observed usage patterns
- Address any cache bugs (stampede events, hot keys)
Maintenance Cadence
Weekly (eng team):
- Review hit rate dashboard. Anything trending down?
- Check Redis memory usage. Approaching limits?
Monthly:
- Review TTL settings. Any data type where TTL is too long (causing stale bugs) or too short (causing high miss rate)?
- Audit cache key versions. Any unused old versions?
Quarterly:
- Cache strategy review: still hitting the right slow endpoints?
- Cost review: Redis memory + CDN spend
- Test invalidation correctness end-to-end with synthetic test
Key Takeaways
- Cache the slow + hot. Dashboard, timeline, user profiles. Skip cold + fast queries.
- 3-layer strategy: app-memory (fastest, smallest), Redis (shared, medium), CDN (edge, public). Each data type fits one layer best.
- Invalidation is a write pattern. Atomic write+invalidate prevents staleness windows.
- Singleton-fetch on cache miss prevents stampede. Per-Lambda is sufficient at your scale.
- Stale-while-revalidate for high-traffic endpoints. User never waits; background revalidation keeps data fresh.
- Versioned cache keys enable schema evolution without manual cache flushes. New version = new key; old version expires naturally.
Common use cases
- Engineer adding caching to a slow endpoint + worried about stale-data bugs
- Tech lead designing caching for a new SaaS launching with growth ahead
- DBA replacing expensive DB queries with cached results
- Backend engineer migrating from no-cache to cached + needing invalidation strategy
- DevOps consolidating fragmented cache layers (multiple Redis instances, ad-hoc app caches)
- Engineer hitting cache stampede / thundering herd bugs in production
Best AI model for this
Claude Opus 4. Caching design needs reasoning about consistency vs performance tradeoffs, blast radius, and invalidation patterns — exactly Claude's strengths. ChatGPT GPT-5 second-best.
Pro tips
- Don't cache by default. Cache the slow + hot. Cold data + fast queries don't benefit.
- Stale-while-revalidate beats hard expiry. Serve stale data while fetching fresh; user never sees lag.
- Cache invalidation is a write-pattern problem. If you write the source AND invalidate the cache atomically, no staleness windows.
- TTL based on data freshness needs, not 'how long does it stay valid.' User profile: 5min staleness OK. Pricing: <30s.
- Cache stampede happens when 100 requests hit a missed key simultaneously and all fetch the source. Use a singleton-fetch pattern.
- Hot keys (1 key getting 50% of traffic) destroy Redis performance. Distribute via consistent hashing or replicate hot keys.
- App-memory cache (Node Map, Python LRU) is faster than Redis for tiny hot data sets. Don't always reach for Redis.
Customization tips
- List ALL data types your service handles, not just the slow ones. The strategy decides 'cache or not' per type — needs the full inventory.
- Specify staleness tolerance per data type. 'OK if 5 minutes stale' vs 'must be real-time' fundamentally shapes the cache layer choice.
- Be specific about scale. At 200 req/sec, app-memory caching is fine; at 20K req/sec, you need distributed Redis Cluster.
- List your infrastructure. Redis vs no-Redis vs CDN vs edge workers determines what's available.
- If you have observed cache bugs, describe specifically. Stampede patterns differ from hot-key patterns; the diagnosis affects the design.
- Use the Solving Cache Bugs Mode variant if you have specific bugs (stale data, stampedes, hot keys) — different diagnostic patterns apply.
Variants
Read-Heavy API Mode
For API services with high read volume — emphasizes Redis caching + CDN at edge.
Database Query Cache Mode
For expensive DB queries — emphasizes materialized views, Redis-cached query results, and invalidation on write.
CDN-First Mode
For content sites + assets — emphasizes Cloudflare/Vercel edge caching, cache-control headers, purge strategies.
Solving Cache Bugs Mode
For teams hitting specific cache bugs (stampede, stale data, hot keys) — diagnoses + provides the structural fix.
Frequently asked questions
How do I use the Caching Strategy Architect prompt?
Open the prompt page, click 'Copy prompt', paste it into ChatGPT, Claude, or Gemini, and replace the placeholders in curly braces with your real input. The prompt is also launchable directly in each model with one click.
Which AI model works best with Caching Strategy Architect?
Claude Opus 4. Caching design needs reasoning about consistency vs performance tradeoffs, blast radius, and invalidation patterns — exactly Claude's strengths. ChatGPT GPT-5 second-best.
Can I customize the Caching Strategy Architect prompt for my use case?
Yes — every Promptolis Original is designed to be customized. Key levers: Don't cache by default. Cache the slow + hot. Cold data + fast queries don't benefit.; Stale-while-revalidate beats hard expiry. Serve stale data while fetching fresh; user never sees lag.
Explore more Originals
Hand-crafted 2026-grade prompts that actually change how you work.
← All Promptolis Originals