What Is Prompt Caching?
Prompt caching is a technique that stores previously-processed prompt tokens so they do not need to be reprocessed on subsequent requests with the same prefix. Anthropic introduced it for Claude in 2024; it can reduce costs by up to 90% and latency by 50-80% for prompts with stable prefixes.
Frequently Asked Questions
When does prompt caching help?
Whenever you have a stable prompt prefix (system message, document context, framework) that does not change across many requests. Examples: querying the same large document with different questions; running the same Original prompt template with different inputs; agent loops with shared context.
How much does prompt caching save?
On Anthropic API, cached tokens are billed at ~10% of normal input cost. For a workflow that processes a 100K-token document with 50 different queries, caching reduces total cost by ~90%. Latency also drops significantly because the model does not re-encode the cached portion.
How do I use prompt caching?
On the Anthropic API, mark a portion of your prompt with a cache_control parameter. The first request processes and stores it; subsequent requests within the cache TTL (typically 5 minutes) reuse the stored representation. OpenAI has analogous features.
What is the cache TTL?
On Anthropic API: 5 minutes default; can be extended to 1 hour with explicit configuration. Cache entries are evicted on TTL expiry or under load.
When does prompt caching NOT help?
When prompts vary too much (no stable prefix), when total tokens are small (overhead exceeds savings), or for one-off requests that are not repeated.
Related Resources
Get new Originals every Friday
2-3 hand-crafted Originals per week. No spam, no upsells, unsubscribe in 1 click.