How to pick an LLM for a customer chatbot in 2026

Start cheap, escalate only when you can measure the gap.

Step 1 — Baseline. Deploy with Gemini 2.5 Flash-Lite or GPT-4o mini. Ship to 5-10% of traffic.

Step 2 — Measure. Instrument three numbers: resolution rate, escalation-to-human rate, CSAT. Don't look at "model quality" in the abstract.

Step 3 — Try the next tier only if you have data. If resolution rate is acceptable, keep the mini model. If it's 5-10 points below target, try Haiku 4.5 or GPT-5 mini on a matched 10% slice. Compare real numbers.

Step 4 — Cache hard. Your system prompt is probably 2-5K tokens. Cache it on every call. On Claude, that's a 10x discount on the cached portion.

Example bill (100K conversations/month, avg 3 turns, 800 input / 200 output per turn): - Flash-Lite: $66/mo - GPT-4o mini: $120/mo - Haiku 4.5 (with cache): $290/mo - Sonnet 4.6 (with cache): $400/mo

Don't over-engineer. A chatbot that resolves 60% of requests on Flash-Lite is worth more than one that resolves 63% on Opus at 30x the cost — for almost every business.