Glossary
Quick, plain-English definitions for LLM pricing terms.
- TokenThe basic unit of text an LLM processes. Roughly 0.75 words in English.
- Context windowThe max number of input tokens a model can process in one request.
- Input tokenTokens you send to the model. Usually 3-10x cheaper than output.
- Output tokenTokens the model generates. The expensive half of the bill.
- Prompt cacheReusing a prompt prefix across calls for a big input-price discount.
- Reasoning tokenHidden tokens used for model's internal thinking. Billed but not shown.
- Tool callWhen the model requests your code to run a function with arguments.
- Batch APISubmit large job offline at 50% discount. Completes within 24 hours.
- Rate limitMax tokens or requests per minute you're allowed. Tier-based.
- Cached inputInput tokens already stored in the provider's prompt cache.
- Mixture of Experts (MoE)Architecture where only some model weights activate per token.
- Frontier modelTop-tier flagship from each major lab. Claude Opus, GPT-5, Gemini Pro.