How to estimate an LLM API budget before you build
Four numbers decide your LLM bill. If you don't have estimates for all four, you can't budget.
1. Requests per month. Not "users" โ actual LLM API calls. Multi-turn chats make this 3-10x higher than it looks.
2. Input tokens per request. System prompt + tool defs + chat history + current user message. Don't forget tool definitions โ they're often half the bill.
3. Output tokens per request. Set a hard max-tokens. Trust nothing else.
4. Cache hit ratio. For production apps with a stable system prompt, this is usually 50-80%. For stateless one-shots, 0%.
Multiply: (input ร request ร cache-discounted rate + output ร request ร output rate) รท 1,000,000. Done.
Common mistakes:
- Forgetting multi-turn multiplier. A 10-turn conversation is not 10x the cost of one turn โ it's 55x (because each turn replays history). Use the arithmetic series. - Using sticker price instead of cached rate. If you plan to cache, use cached input rate for the cached portion. - Ignoring retries. Budget 1.1-1.3x for retries on JSON failures and tool errors. - Forgetting images. A single 1024x1024 image is ~1500 tokens.
Sanity-check rule: if your budget number sounds suspiciously low, it's probably missing multi-turn or tool-def inflation. Re-check.
Then build a working prototype and measure real usage for two weeks. Your estimate will be wrong โ usually by 30-80%. That's fine. Adjust once you have real data.