Seven hidden costs in LLM API bills

2026-04-22 · Choppy Toast

Your API bill is almost always bigger than the napkin math suggested. Here's why:

1. Tool / function descriptions are input tokens

Every call with tool definitions prepends those schemas. A 15-tool agent can add 3-5K tokens to every single call. Multiplied by thousands of calls/day, this dominates the bill.

2. Multi-turn tool loops

One user message → model calls tool → returns result → model calls another tool → …. Each round-trip repeats the whole system prompt + tool defs + prior turns. A 6-step agent loop consumes ~6x the input of a single-turn chat.

3. Retries on schema failures

JSON mode failures, refusal loops, or tool-call malformations trigger retries in your app. These cost double or triple.

4. Image tokens

A single 1024x1024 image is typically 1500-3000 input tokens. A vision chatbot with a few screenshots per turn can easily 10x chat-only usage.

5. Reasoning tokens you can't see

OpenAI's o-series and thinking modes charge for internal reasoning tokens even though the API only returns the final answer. A "short" answer can hide 10K reasoning tokens.

6. Chat history replay

Every turn in a conversation re-sends the whole history. 20-turn conversation = 20x the input tokens of the first turn (approximately, without caching).

7. Streaming abandon

User clicks away, but your server keeps streaming to completion. Output tokens are billed anyway. Add cancel-on-disconnect.

Fix order (biggest impact first):

1. Turn on prompt caching. 5-10x reduction on repeated prefixes. 2. Trim tool schemas aggressively. Drop unused tools per request. 3. Compact chat history after N turns (summarize older turns). 4. Set hard max-token limits on output. Don't trust the model to stop. 5. Watch streaming cancellation.