How to pick an LLM for a coding agent

Coding agents have three cost centers: repo-read input, long reasoning output, and tool-loop re-reads. Different models win at different parts.

Current leaders on SWE-bench Verified (April 2026): - Claude Opus 4.7: ~75% - GPT-5: ~70% - Gemini 2.5 Pro: ~63% - o3: ~71% with heavy thinking, but slow + $40/M output

Practical recommendations:

For IDE autocomplete / inline edits: Haiku 4.5 or Gemini Flash. Latency matters more than 5% quality gap.

For agent tasks (read repo, plan, apply edits): Sonnet 4.6 is the cost-sanest default. Tool-use reliability + 200K context + caching makes total bill 3-5x lower than Opus while staying close in quality.

For hard debugging: Opus 4.7 or o3. You're paying for the last 10% of capability that actually unblocks hard bugs. If it takes 30 minutes of human time off an engineer, $1 of Opus is trivial.

For long-context planning (massive codebases): Gemini 2.5 Pro. Its 2M window + cheap cached input often beats shorter-context models on repo-wide reasoning.

Cache your repo. Every coding agent should use prompt caching — repeat reads of the same files dominate the bill otherwise. Anthropic's 10% cached rate is a 90% discount on your biggest line item.