o3 vs o4-mini — when does thinking pay?

🧩

OpenAI

$10 / in · $40 / out

per 1M tokens

Context: 200K

Cached input: $2.5/M

🎯

o4-mini

OpenAI

$1.1 / in · $4.4 / out

per 1M tokens

Context: 200K

Cached input: $0.275/M

Both are OpenAI's reasoning-specialized models. o3 is the flagship; o4-mini is the value tier.

Price. o3 $10/$40. o4-mini $1.10/$4.40. o3 is about 9x more expensive per output token.

Hard reasoning benchmarks. On GPQA-hard and FrontierMath, o3 opens a 10-20 point lead over o4-mini. For actually hard problems, the gap is real.

Coding. On SWE-bench, o3 leads by ~5-8 points, but both are below Claude Opus 4.7 as of April 2026.

Latency. o3 can take 30-120 seconds for hard problems. o4-mini typically responds in 5-20 seconds. For anything user-facing, o4-mini's latency advantage is decisive.

Reasoning token cost. Both hide chain-of-thought tokens in output billing. o3 uses 5-10x more reasoning tokens than o4-mini on matched problems, multiplying the price gap in practice.

Practical verdict:

- Research, hard math, one-shot analysis where time doesn't matter → o3. - Production reasoning at scale, coding review, structured extraction → o4-mini.

o4-mini at $1.10/$4.40 is one of the best value picks in the industry. o3 is a specialist tool, not a daily driver.