New research shows that the best way to reduce artificial intelligence costs depends entirely on what you're asking the model to do. The study, which tested 2,650 individual trials across 72 experimental conditions, found that code generation and logical reasoning—two of the most common enterprise workloads—respond to cost optimization in completely opposite ways.

The findings challenge the dominant strategy in AI deployment, where organizations typically apply the same cost-cutting approach across all their workloads. Instead, the research suggests that companies need to classify their tasks first, then optimize accordingly.

Reference
Johnson, W. (2026). Compress or route? Task-dependent strategies for cost-efficient large language model inference. Zenodo. https://doi.org/10.5281/zenodo.18316726

The cliff effect

When researchers compressed prompts for code generation tasks—asking models to write Python functions—quality remained high even as prompts shrank considerably. At 70% of original length, the code still worked perfectly. At 60%, performance stayed strong.

Below that threshold, everything collapsed. The models began producing syntax errors, missing edge cases, and writing functions that simply didn't work. The researchers call this the "cliff effect"—a sharp boundary where compression stops being free and starts being fatal.

The explanation comes down to information density. Code prompts contain conversational padding—phrases like "Please write a function that..." or "Make sure to handle the case where..."—that compression algorithms can safely strip away. The core specification is much denser. Remove too much and critical details disappear.

Reasoning behaves differently

Chain-of-thought reasoning tasks showed no such cliff. Quality degraded gradually from the very first compression. Even removing 10% of tokens measurably hurt performance.

A reasoning prompt like "If Train A leaves Chicago at 9am traveling 60mph..." contains no filler. Every word carries logical weight. Compress it and you're removing pieces of the puzzle, not padding.

But reasoning tasks responded well to a different optimization: routing queries to cheaper models. Many problems that seem complex don't actually require the most powerful AI systems. Mid-tier models solved them correctly at a tenth of the cost.

A combined approach

When the researchers applied compression to code tasks and routing to reasoning tasks, they achieved cost reductions of up to 93% compared to sending everything uncompressed to top-tier models. Quality dropped just 6.2%.

"Companies have been treating all their AI workloads the same way... This research suggests they should be treating them very differently" (Johnson, 2026, p. 12).

The practical implications are straightforward: organizations running code generation can safely compress prompts by 30-40% and pocket immediate savings. For reasoning-heavy applications, intelligent routing offers a better path.

Open questions

The study focused on two task types. How other workloads—summarization, translation, creative writing—respond to these strategies remains an open question.

There's also a gap in existing tools. Current compression algorithms weren't designed with task awareness in mind. Building systems that recognize whether they're processing code or reasoning could extend the benefits to new domains.

For now, the core finding stands: know your workload. One size does not fit all.