Prompt caching, explained with real math
2026-04-22 ยท Choppy Toast
Every major provider now caches, but their economics are different:
Anthropic โ cached input is 10% of regular price. Opus 4.7 drops from $15 to $1.50 per million. But you pay a 25% *write* surcharge on the first call. Cache lives 5 minutes by default (1 hour tier costs 2x write).
OpenAI โ automatic cache for identical prompt prefixes. Cached tokens are 25-50% of regular price (varies by model). No cache-write premium. Cache lives 5-10 minutes.
Google โ context caching is 25% of input price, and you pay a per-hour storage fee for the cached tokens. Great for huge system prompts that don't change.
When caching actually pays off:
You need the same prompt prefix hit repeatedly within the cache TTL. Coding agents re-reading the same repo context = yes. One-off summarization jobs = no.
Example: a customer support chatbot with a 4K-token system prompt hit 10K times/month. On Claude Opus 4.7: - Without cache: 4K ร 10K ร $15/M = $600 - With cache (90% hit, 25% write premium on 10% miss): 400 ร $15/M + 36,000 ร $1.50/M ร 10K รท 1M = $60 โ a 10x saving.
The trap:
If your hit rate is below ~40%, the write premium on Anthropic can make caching cost *more* than no caching. Always measure hit rate before celebrating.