Context window
The max number of input tokens a model can process in one request.
The context window is the maximum total tokens (input + output) a model can handle in a single API call. As of April 2026: Claude models = 200K. GPT-4.1 = 1M. GPT-5 = 400K. Gemini 2.5 Pro = 2M (Enterprise), 1M standard. DeepSeek V3.1 = 128K. Real-world recall often degrades well before the stated limit — most models lose coherence past 50-70% of their advertised window on complex queries. For long documents, prompt caching with a smaller model often beats using a bigger window.