Three LLM routing strategies that actually cut costs
2026-04-22 ยท Choppy Toast
1. Classifier-first routing
Spend $0.10/M to ask a tiny model "is this request simple or hard?", then send 80% of the traffic to a cheap model and 20% to a smart one.
Math for a 100K-req/month app averaging 1K input / 500 output: - All Opus 4.7: $4,250 - All Haiku 4.5: $370 - Classifier (Flash-Lite) + 80% Haiku / 20% Opus: $110 classifier + $296 Haiku + $850 Opus = $1,256
Saves 70% vs all-Opus with barely measurable quality drop โ if the classifier is accurate enough.
2. Cascading
Try the cheap model first. If its confidence is low (log-probs, self-evaluation, or an explicit "did that answer work?" pass), retry on a smarter model.
Works great when ~70-85% of traffic is easy. Overhead per escalation is ~1 extra model call, so break-even is when the expensive-to-cheap price ratio exceeds ~5x, which every provider pair satisfies.
3. Confidence-based
Use the cheap model's own estimate of how sure it is. For classification and extraction tasks, log-prob thresholds give clean escalation signals. Harder to calibrate for open generation.
What usually doesn't work
Rule-based routing on prompt length. Token count is not a great proxy for difficulty โ a 200-token math problem is harder than a 10K-token "summarize this meeting."
The easy win
Most teams haven't tried even #1. A Friday-afternoon classifier in front of your API cuts bills more than any model switch will.