Question 1

When does prompt caching help?

Accepted Answer

Whenever you have a stable prompt prefix (system message, document context, framework) that does not change across many requests. Examples: querying the same large document with different questions; running the same Original prompt template with different inputs; agent loops with shared context.

Question 2

How much does prompt caching save?

Accepted Answer

On Anthropic API, cached tokens are billed at ~10% of normal input cost. For a workflow that processes a 100K-token document with 50 different queries, caching reduces total cost by ~90%. Latency also drops significantly because the model does not re-encode the cached portion.

Question 3

How do I use prompt caching?

Accepted Answer

On the Anthropic API, mark a portion of your prompt with a cache_control parameter. The first request processes and stores it; subsequent requests within the cache TTL (typically 5 minutes) reuse the stored representation. OpenAI has analogous features.

Question 4

What is the cache TTL?

Accepted Answer

On Anthropic API: 5 minutes default; can be extended to 1 hour with explicit configuration. Cache entries are evicted on TTL expiry or under load.

Question 5

When does prompt caching NOT help?

Accepted Answer

When prompts vary too much (no stable prefix), when total tokens are small (overhead exceeds savings), or for one-off requests that are not repeated.

What Is Prompt Caching?

Frequently Asked Questions

When does prompt caching help?

How much does prompt caching save?

How do I use prompt caching?

What is the cache TTL?

When does prompt caching NOT help?

Related Resources

Get new Originals every Friday