Updated April 2026 · 20 models · 8 providers

LLM API Pricing Calculator

Compare real API costs for Claude Opus 4.7, GPT-5, Gemini 2.5 Pro, Llama, DeepSeek and more. Cache-aware, preset workloads, no signup.

#1 cheap🗺️Llama 4 Scout$0.11/$0.34 per 1M
#2 cheap💨Gemini 2.5 Flash-Lite$0.1/$0.4 per 1M
#3 cheap💠GPT-4o mini$0.15/$0.6 per 1M
Workload preset
50100K
5020K
1001.0M
60%
💨
Gemini 2.5 Flash-Lite
Cheapest
Google · $0.1/$0.4 Per 1M tokens · 1.0M Context
$1.13
Monthly cost · $0.00011 Per request
Input : Output 29% : 71%saves $190.27
🗺️
Llama 4 Scout
Open
Meta (via Groq) · $0.11/$0.34 Per 1M tokens · 10.0M Context
$1.34
Monthly cost · $0.00013 Per request
Input : Output 49% : 51%+19% over
💠
GPT-4o mini
OpenAI · $0.15/$0.6 Per 1M tokens · 128K Context
$1.83
Monthly cost · $0.00018 Per request
Input : Output 34% : 66%+62% over
🐋
DeepSeek V3.1
Open
DeepSeek · $0.27/$1.1 Per 1M tokens · 128K Context
$3.10
Monthly cost · $0.00031 Per request
Input : Output 29% : 71%+174% over
GPT-5 mini
OpenAI · $0.25/$2 Per 1M tokens · 400K Context
$4.71
Monthly cost · $0.00047 Per request
Input : Output 15% : 85%+317% over
🦙
Llama 3.3 70B
Open
Meta (via Groq) · $0.59/$0.79 Per 1M tokens · 128K Context
$5.12
Monthly cost · $0.00051 Per request
Input : Output 69% : 31%+353% over
Gemini 2.5 Flash
Google · $0.3/$2.5 Per 1M tokens · 1.0M Context
$5.99
Monthly cost · $0.00060 Per request
Input : Output 17% : 83%+430% over
🀄
Qwen3 72B
Open
Alibaba · $0.5/$1.5 Per 1M tokens · 128K Context
$6.00
Monthly cost · $0.00060 Per request
Input : Output 50% : 50%+431% over
🧮
DeepSeek R1
Open
DeepSeek · $0.55/$2.19 Per 1M tokens · 128K Context
$6.20
Monthly cost · $0.00062 Per request
Input : Output 29% : 71%+449% over
🎯
o4-mini
OpenAI · $1.1/$4.4 Per 1M tokens · 200K Context
$12.43
Monthly cost · $0.00124 Per request
Input : Output 29% : 71%+1000% over
🌸
Claude Haiku 4.5
Anthropic · $1/$5 Per 1M tokens · 200K Context
$12.76
Monthly cost · $0.00128 Per request
Input : Output 22% : 78%+1029% over
🚀
GPT-4.1
OpenAI · $2/$8 Per 1M tokens · 1.0M Context
$22.60
Monthly cost · $0.00226 Per request
Input : Output 29% : 71%+1900% over
🌬️
Mistral Large 2
Open
Mistral · $2/$6 Per 1M tokens · 128K Context
$24.00
Monthly cost · $0.00240 Per request
Input : Output 50% : 50%+2024% over
🔷
Gemini 2.5 Pro
Google · $1.25/$10 Per 1M tokens · 2.0M Context
$24.12
Monthly cost · $0.00241 Per request
Input : Output 17% : 83%+2034% over
🎨
GPT-4o
OpenAI · $2.5/$10 Per 1M tokens · 128K Context
$30.50
Monthly cost · $0.00305 Per request
Input : Output 34% : 66%+2599% over
🎻
Claude Sonnet 4.6
Anthropic · $3/$15 Per 1M tokens · 200K Context
$38.28
Monthly cost · $0.00383 Per request
Input : Output 22% : 78%+3288% over
🦅
Grok 4
xAI · $5/$15 Per 1M tokens · 256K Context
$46.50
Monthly cost · $0.00465 Per request
Input : Output 35% : 65%+4015% over
🌌
GPT-5
OpenAI · $10/$30 Per 1M tokens · 400K Context
$93.00
Monthly cost · $0.00930 Per request
Input : Output 35% : 65%+8130% over
🧩
o3
OpenAI · $10/$40 Per 1M tokens · 200K Context
$113
Monthly cost · $0.011 Per request
Input : Output 29% : 71%+9900% over
🧠
Claude Opus 4.7
Anthropic · $15/$75 Per 1M tokens · 200K Context
$191
Monthly cost · $0.019 Per request
Input : Output 22% : 78%+16838% over

What does an LLM API cost per token?

As of April 2026, production LLM APIs range from $0.10 to $15 per million input tokens, and $0.40 to $75 per million output tokens. Cached input prices are 10-50% of regular input. A typical customer chatbot with 600 input / 200 output tokens per request can run on Google Gemini 2.5 Flash-Lite for under $2 per 10,000 requests, while the same workload on Claude Opus 4.7 without caching would cost about $225. Caching, batching, and model routing are the three biggest levers for controlling spend.

Cheapest production LLM
💨 Gemini 2.5 Flash-Lite · $0.10 / $0.40
Cheapest frontier
🔷 Gemini 2.5 Pro · $1.25 / $10
Most capable coding
🧠 Claude Opus 4.7 · $15 / $75

Frequently Asked Questions

Which LLM API is cheapest in April 2026?

For usable general-purpose LLMs, Google's Gemini 2.5 Flash-Lite is the cheapest at $0.10 per million input tokens and $0.40 per million output tokens. GPT-4o mini is close behind at $0.15/$0.60. DeepSeek V3.1 is the cheapest in the 'smart enough for production' tier at $0.27/$1.10. Among frontier models, Gemini 2.5 Pro dominates at $1.25/$10, undercutting GPT-5 ($10/$30) and Claude Opus 4.7 ($15/$75) for most workloads.

How do I calculate LLM API cost?

Multiply input tokens per request × request count × input price per million ÷ 1,000,000, then do the same for output, then add them. Always use the cached input rate for any portion you expect to cache. A rough shorthand: English text is ~4 characters per token, so 1 word ≈ 1.3 tokens. Reasoning models (o3, Claude extended thinking) may bill for invisible thinking tokens that multiply output cost by 5-20x.

What is the Claude Opus 4.7 API price?

Claude Opus 4.7 costs $15 per million input tokens, $75 per million output tokens, and $1.50 per million cached input tokens. It also has a 25% write surcharge on the first cache-write call. The 10% cached rate is the most aggressive of any frontier model, which matters for workloads that reuse prompt prefixes.

What does GPT-5 cost per token?

GPT-5 costs $10 per million input tokens, $30 per million output tokens, and $2.50 per million cached input tokens as of April 2026. GPT-5 mini is dramatically cheaper at $0.25 in and $2 out. OpenAI automatically caches identical prompt prefixes, so the cached discount requires no manual configuration unlike Anthropic's explicit cache_control flag.

Is Gemini 2.5 Pro really cheaper than Claude Sonnet 4.6?

Yes, for most workloads. Gemini 2.5 Pro is $1.25 in / $10 out, while Sonnet 4.6 is $3 in / $15 out. Gemini also offers 2M context vs Sonnet's 200K, and cheaper cached input ($0.31 vs $0.30 per million). Sonnet still wins on tool use reliability, agentic coding, and instruction following — so for coding agents and complex workflows, the premium is often justified.

What is prompt caching and how much does it save?

Prompt caching stores a prompt prefix on the provider's servers so repeated requests reuse it at a steep discount. Anthropic charges 10% of regular input for cached tokens. OpenAI charges 25-50%. Google charges 25% plus a storage fee. For production chatbots with stable 2-5K-token system prompts hit thousands of times, caching typically cuts total input cost by 60-80%.

What are reasoning tokens and how do they affect cost?

Reasoning tokens (OpenAI o-series) and extended thinking tokens (Anthropic) are output tokens the model generates privately to work through a problem before showing an answer. The API doesn't return them but bills for them as output tokens. On o3, a short 200-word visible answer can hide 5,000-50,000 reasoning tokens, multiplying effective output cost by 5-20x. Always set max_tokens limits.

How do I cut my LLM API bill?

Five highest-impact moves: (1) Turn on prompt caching for your system prompt and tool definitions. (2) Trim tool schemas — drop unused tools per request. (3) Cap max_tokens strictly — don't trust the model to stop. (4) Add a cheap classifier in front to route easy requests to a smaller model. (5) Use Batch API for anything non-user-facing (50% discount on Anthropic, OpenAI, Google).

Is DeepSeek cheaper than OpenAI?

Only sometimes. DeepSeek V3.1 at $0.27/$1.10 is about 2x cheaper than GPT-4o mini ($0.15/$0.60) on input but more expensive on output. For reasoning workloads, DeepSeek R1 ($0.55/$2.19) is roughly half the price of o4-mini ($1.10/$4.40) and a fraction of o3. DeepSeek's open-weights also let you self-host, which nothing from OpenAI allows.

Which LLM has the largest context window?

Gemini 2.5 Pro has 2 million tokens (Enterprise tier), with 1 million standard. GPT-4.1 has 1 million. GPT-5 has 400K. Claude models top out at 200K. Llama 4 Scout has 10 million on paper but recall drops off past 1 million in practice. For documents over 200K tokens, Gemini 2.5 Pro or GPT-4.1 are the only serious choices.

Is open-source LLM hosting cheaper than using APIs?

Usually not for cost alone. Hosted open-weights like Llama 3.3 70B on Groq ($0.59/$0.79) don't undercut Gemini Flash-Lite ($0.10/$0.40) or GPT-4o mini ($0.15/$0.60). Self-hosting on an H100 ($2/hr rental) breaks even around 100% GPU utilization, which almost nobody achieves. Open-weights win for data residency, fine-tuning, or ultra-low-latency inference — not pure cost.

What is the Anthropic Batch API discount?

Anthropic Batch API offers 50% off both input and output prices, with results delivered within 24 hours. It's available on all Claude models. OpenAI and Google offer the same 50% batch discount. Batch is perfect for evaluations, offline classification, content generation for blogs, and data enrichment — anything that isn't user-facing.

How does context window affect pricing?

Context window is a ceiling, not a price driver — you pay only for tokens you actually send, not for the unused space. A 2M-context model doesn't cost more for a 1K-token prompt than a 128K-context model. The tradeoff is that bigger windows sometimes come with higher per-token rates, but Gemini 2.5 Pro proves this isn't a rule — it has the biggest window and one of the lowest prices.

Why does my LLM bill keep going up?

The usual suspects: (1) Tool definitions bloating input tokens on every call — a 15-tool agent adds 3-5K tokens per request. (2) Chat history replay on each turn — a 20-turn conversation is ~55x the tokens of the first turn. (3) Unlimited max_tokens letting the model ramble. (4) Retries on JSON-mode failures. (5) Image inputs at 1500-3000 tokens each. Caching fixes most of these.

What's the cheapest way to run a chatbot?

Gemini 2.5 Flash-Lite with aggressive prompt caching. For 10,000 conversations per month at 600 input / 200 output per turn with a 4K cached system prompt, you're looking at under $10/month. GPT-4o mini is close behind at ~$15/month. DeepSeek V3.1 is $20-40. Any frontier model starts at 10-30x that cost, which is rarely worth it for simple customer chatbots.

Pricing data reflects publicly listed API rates as of April 2026 and is provided for comparison only. Always confirm current prices on each provider's pricing page before signing agreements. Claude, GPT, Gemini, Llama, DeepSeek, Grok, Mistral, and Qwen are trademarks of their respective owners. This site is not affiliated with any LLM provider.