Prompt caching, explained with real math

2026-04-22 · Choppy Toast

Every major provider now caches, but their economics are different:

Anthropic — cached input is 10% of regular price. Opus 4.7 drops from $15 to $1.50 per million. But you pay a 25% *write* surcharge on the first call. Cache lives 5 minutes by default (1 hour tier costs 2x write).

OpenAI — automatic cache for identical prompt prefixes. Cached tokens are 25-50% of regular price (varies by model). No cache-write premium. Cache lives 5-10 minutes.

Google — context caching is 25% of input price, and you pay a per-hour storage fee for the cached tokens. Great for huge system prompts that don't change.

When caching actually pays off:

You need the same prompt prefix hit repeatedly within the cache TTL. Coding agents re-reading the same repo context = yes. One-off summarization jobs = no.

Example: a customer support chatbot with a 4K-token system prompt hit 10K times/month. On Claude Opus 4.7: - Without cache: 4K × 10K × $15/M = $600 - With cache (90% hit, 25% write premium on 10% miss): 400 × $15/M + 36,000 × $1.50/M × 10K ÷ 1M = $60 — a 10x saving.

The trap:

If your hit rate is below ~40%, the write premium on Anthropic can make caching cost *more* than no caching. Always measure hit rate before celebrating.