Consider a typical scenario: a customer support bot using GPT-4o for all queries at a monthly cost of around $3,200. What would happen if you applied AgentBurn's cost visibility to find optimization opportunities?
The Likely Discovery
Based on typical support bot workloads, a cost dashboard would likely reveal:
- A large portion of queries — often 60-80% — are simple FAQ-type questions (password resets, billing inquiries, feature locations)
- System prompts tend to grow over time, inflating per-call token counts
When simple queries are sent to an expensive model, you're paying premium prices for commodity work.
The Optimization
Step 1: Classify incoming queries using GPT-4o-mini (cost: ~$0.0002 per classification). Route simple queries to GPT-4o-mini, complex ones to GPT-4o.
Step 2: Split the system prompt into a base prompt (~200 tokens) and context modules loaded on demand. Simple queries get the base prompt only.
Projected Impact
Based on published model pricing (GPT-4o-mini is ~17x cheaper than GPT-4o for input tokens), a scenario like this could yield:
- Before: ~$3,200/month (100% GPT-4o)
- After: ~$800/month (majority GPT-4o-mini, remainder GPT-4o)
- Estimated savings: ~75% reduction
The key insight isn't the routing itself — it's the visibility. Without per-query cost tracking, you have no way to know that most queries don't need an expensive model. A cost dashboard makes the waste obvious.