Token

The basic unit of text an LLM processes. Roughly 0.75 words in English.

A token is a chunk of text that a language model processes as a single unit. English text averages 4 characters per token, so 1,000 tokens is ~750 words. Korean, Japanese, and Chinese are denser: 1 character often equals 1-2 tokens. Code tends to be token-efficient (variable names often single tokens), while punctuation-heavy text is denser. Every LLM API bills by token count on both input and output. Tokenizers differ by provider — GPT-5 and Claude use slightly different segmentations — so a 1K-token prompt on one model may be 1.1K on another.