Cost Savings Methodology
A transparent explanation of how Plexor Labs calculates and reports cost savings. We believe users deserve to understand exactly how their savings are computed.
Overview
Plexor Labs helps reduce LLM costs through intelligent provider routing. When you send a request through Plexor Labs, we route it to the most cost-effective provider that can handle your task while maintaining quality. Your cost savings represent the difference between what you would have paid using a standard provider and what you actually paid through Plexor Labs' optimized routing.
Reference Model Methodology
To calculate meaningful savings, we need a baseline: "What would this request cost without Plexor Labs??" We use Claude Sonnet as our reference model because:
- It's the most commonly used production model for agentic coding tasks
- It represents what most developers would use as their default
- Its pricing is well-documented and stable
Reference Model Pricing (January 2026)
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Role |
|---|---|---|---|
claude-sonnet-4.5 |
$3.00 | $15.00 | Reference Baseline |
Calculation Formula
For each API request, we calculate cost savings using these formulas:
Pricing Data Sources
We maintain current pricing data for all supported providers. Here are the key providers and their pricing (per million tokens):
| Provider | Model | Input | Output | vs Claude Sonnet |
|---|---|---|---|---|
| Anthropic | claude-sonnet-4.5 |
$3.00 | $15.00 | Baseline (0%) |
| Anthropic | claude-haiku-4.5 |
$1.00 | $5.00 | ~67% cheaper |
| OpenAI | gpt-4o |
$2.50 | $10.00 | ~29% cheaper |
| OpenAI | gpt-4o-mini |
$0.15 | $0.60 | ~95% cheaper |
| DeepSeek | deepseek-chat |
$0.14 | $0.28 | ~95% cheaper |
| Mistral | ministral-3b |
$0.04 | $0.04 | ~99% cheaper |
gemini-2.0-flash |
$0.10 | $0.40 | ~97% cheaper |
Sources: Pricing data is sourced directly from each provider's official documentation
and updated regularly. See our src/plexor/pricing.py file for the complete,
up-to-date pricing configuration.
Worked Examples
Example 1: Simple Query Routed to DeepSeek
# Calculation breakdown
baseline = (1000 / 1_000_000) × $3.00 + (500 / 1_000_000) × $15.00
= $0.003 + $0.0075
= $0.0105
actual = (1000 / 1_000_000) × $0.14 + (500 / 1_000_000) × $0.28
= $0.00014 + $0.00014
= $0.00028
savings = $0.0105 - $0.00028 = $0.01022 (97.3% saved)
Example 2: Complex Task Staying on Claude
Limitations & Caveats
What Our Savings Calculation Does NOT Include
- Token optimization: We report raw token counts. Some requests may benefit from prompt optimization, but we don't claim savings from compression.
- Quality differences: Cheaper models may produce different (sometimes lower quality) outputs. Savings are purely financial.
- Latency costs: Some cheaper providers may have higher latency. We don't factor opportunity cost into savings.
- Your actual Claude contract: If you have negotiated pricing with Anthropic, your real savings may differ from our baseline.
Potential Biases
- Reference model choice: Using Claude Sonnet as baseline maximizes apparent savings when routing to cheaper providers. If you typically use cheaper models, your real savings would be lower.
- Task mix: Savings vary significantly by task type. Simple tasks routed to cheap providers show high savings; complex tasks may show zero.
Audit Your Own Data
We encourage you to audit your usage data. Your dashboard shows per-request breakdowns
with actual provider used, token counts, and costs. The raw data is available via our
API at GET /v1/stats.