Back to Home

NavyaAI Private Limited · February 2026

The Token Price Collapse
& The Consumption Explosion

How a 99.7% price drop created the biggest AI spending surge in tech history

99.7%

Drop in token cost

2023–2025

Enterprise AI spend growth

2024–2025

$602B

Top-5 hyperscaler AI capex

2026

72%

IT leaders: AI spend unmanageable

Between March 2023 and August 2025, the cost of running AI inference at a fixed capability level fell by 99.7%. That is not a rounding error or an analyst’s optimistic framing — it is a verifiable, documented fact drawn from primary pricing records. GPT-4 launched at a blended cost of $37.50 per million tokens. By August 2025, the cost-efficiency frontier had reached $0.14 per million tokens. No commercial technology in recorded history has undergone a comparable price restructuring in such a compressed window.

The natural assumption — the one that any rational person would make — is that enterprise AI spending fell alongside it. The opposite happened. Enterprise AI cloud expenditure tripled in a single year, growing from $11.5 billion in 2024 to $37 billion in 2025. The top five hyperscalers committed a combined $602 billion in AI infrastructure capital for 2026. Token consumption at Google grew 130-fold in eighteen months.

This report explains why — and what the gap between cheap tokens and soaring total spend means for every organization running AI in production today.

Section 1

The Fastest Price Collapse in Tech History

When OpenAI launched GPT-4 in March 2023, it was priced at $30 per million input tokens and $60 per million output tokens — a blended rate that J.P. Morgan Asset Management’s cost-efficiency benchmark placed at $37.50 per million tokens. At that price, using AI at any meaningful scale required a deliberate business case. Only premium, high-value workflows could absorb it.

What followed was not gradual market erosion. It was a structural collapse in three overlapping waves, each driven by a different force.

PeriodWhat drove itPrice per 1M tokensReduction
Mar 2023 – Dec 2023GPT-4 at launch; almost no competition$30 – $60Baseline
Jan 2024 – Sep 2024GPT-4o, Claude 3, Gemini entering market$5 – $15~75%
Oct 2024 – Mar 2025MoE architecture; open-source pressure$0.50 – $3~95%
Apr 2025 – Aug 2025DeepSeek shock, GPT-5 Nano, commoditization$0.02 – $0.5599.7%

The DeepSeek moment

The single most disruptive event in this timeline was DeepSeek-R1’s launch in January 2025. DeepSeek released a frontier-tier reasoning model at $0.55 per million input tokens — 90 to 95 percent below OpenAI’s o1, then priced at $15 per million. Sam Altman described it publicly as “20 to 50 times more affordable” than o1 depending on the task.

The temptation is to read this as aggressive pricing strategy. That misreads what happened. DeepSeek’s efficiency gains were architectural. Its V3 model uses Mixture-of-Experts architecture, activating only 37 billion of 671 billion total parameters per token, with a total training cost of $5.6 million — roughly one-tenth of comparable Western runs. The pricing reflected genuine structural efficiency, not a subsidy.

The implication extends beyond the event itself: over 60 percent of frontier model releases since early 2025 now incorporate MoE architectures. The techniques that produced DeepSeek’s economics are not proprietary — they have already diffused across the industry. Combined with capable open-weight models (Llama, Mistral, DeepSeek itself), proprietary inference providers face a permanent price ceiling. The commodity floor does not reverse.

Section 2

Why Cheaper Didn’t Mean Less

William Stanley Jevons observed in 1865 that more efficient steam engines did not reduce coal consumption — they expanded it. Efficiency made coal economically accessible to entirely new categories of use, and aggregate demand rose far beyond what the efficiency gains saved. For 160 years this was a useful but somewhat academic observation about energy markets. It is no longer academic.

As token prices fell through successive thresholds, each drop unlocked a category of AI use that was previously economically irrational. At $30 to $60 per million tokens, AI was viable only for high-value, low-volume tasks. At $5 to $15, enterprise chatbots and co-pilots became feasible at team scale. At $0.50 to $3, automated workflows dropped below a cent per completed task. At $0.02 to $0.55 — where the market sits today — always-on, multi-turn, multi-agent systems running at infrastructure scale became economically rational for the first time.

Each threshold did not replace the previous category of use. It added to it. The result was not substitution — it was accumulation. And the consumption that accumulated across all these categories dwarfed any savings that cheaper tokens generated.

Cheap tokens did not reduce enterprise AI spend. They unlocked consumption at a scale that made the savings irrelevant.

Enterprise AI cloud spending grew from $11.5 billion in 2024 to $37 billion in 2025 — a 3× increase — against a backdrop of more than 95 percent per-token cost reduction. Google’s internal token processing grew 130-fold over eighteen months. The hyperscaler $602 billion capex commitment for 2026 is not irrational behavior — it is a rational bet on continued consumption growth as AI becomes ambient infrastructure.

Section 3

The Real Bill Doesn’t Come from Your Inference Provider

Here is the sleight of hand embedded in every AI infrastructure conversation: cost discussions default to token price, as if the two were synonymous. They are not, and the gap between them is where margins quietly disappear.

For organizations running AI in production at any meaningful scale, the inference invoice — the line item from OpenAI or Anthropic or Google — represents somewhere between 20 and 40 percent of actual total AI infrastructure cost. The remaining 60 to 80 percent is diffused across components that do not appear on any single bill and that most finance teams have never modeled.

What’s not on the inference invoiceWhy it’s expensive
Idle GPU allocationReserved compute sitting unused in off-peak windows. The capacity is purchased whether or not it runs.
Peak-load overprovisioningInfrastructure bought for worst-case demand, not average load. The gap is large enough that chronic overcapacity is the norm.
Vector databases and data egressEmbedding stores, retrieval infrastructure, and data movement costs — frequently absent from AI cost models. At scale, not negligible.
Observability and tracingLogging LLM calls, tracing latency, monitoring quality at production scale. Typically 5 to 15 percent of total operational spend.
Security and compliance overheadPrompt injection defenses, audit logging, data residency enforcement. Non-negotiable in regulated industries. Frequently unbudgeted.
Engineering timeThe largest invisible cost: developer hours on prompt engineering, evaluation pipelines, model updates, and infrastructure maintenance.

The aggregate effect: 72 percent of IT leaders find AI cloud spending “unmanageable,” with an average overspend of 30 percent versus budget. That gap is not driven by token prices — it is driven by the components above, which scale proportionally with consumption and are structurally invisible without purpose-built instrumentation.

Section 4

The Agentic Shift Changes the Math Entirely

Everything described so far is already true for organizations running conventional, single-turn AI applications. The shift to agentic workflows is about to make it significantly harder.

A single human-facing query calls an LLM once. An agentic workflow — a software agent autonomously completing a multi-step task — may call the same model 50 to 500 times, each call carrying accumulated context state from previous steps. Token consumption per unit of user-visible output increases by one to two orders of magnitude. The cost per completed workflow, already difficult to measure in single-turn applications, becomes significantly harder to track and attribute in agentic ones.

A query calls the model once. An agent calls it 500 times. Most cost models were built for the query.

This is no longer theoretical. Anthropic’s Claude Code launched in May 2025 and reached over $2.5 billion in annualized revenue by early 2026 — representing hundreds of billions of tokens consumed monthly within a single product category. Reasoning models compound the problem further: OpenAI’s o3 consumes approximately 83 times more compute per task than a standard GPT-4o response. The frontier price does not fall as older tiers commoditize — it stays constant while yesterday’s frontier becomes the new budget option. Every generation of capability expansion resets consumption upward.

Organizations not instrumenting for this curve today will encounter it as a budget shock.

Section 5

What Nobody in Your Finance Team Can Answer

There is a question that should be straightforward for any organization running AI at meaningful scale: which workflows are we running, what does each one cost, and is that cost justified by the value it produces?

For the majority of enterprises in active AI deployment today, none of those sub-questions can be answered with any precision. AI costs appear in aggregate cloud invoices, commingled with broader infrastructure spend. They cannot be attributed to specific workflows, teams, or business outcomes. The granularity required for meaningful cost management simply does not exist.

This is not a data availability problem. Modern AI infrastructure generates rich telemetry — it logs API calls, records latency, tracks token usage at the request level. The data exists. What does not exist is the layer that captures it, normalizes it, and surfaces it in a format that lets an engineering lead and a finance director have a productive conversation about what the AI deployment actually costs.

The visibility gap is the direct consequence of how enterprise AI adoption unfolded. Organizations moved fast, experimented broadly, shipped products. Cost management was a second-order concern when the primary challenge was building anything that worked. That made sense in 2023. It is no longer defensible in 2026, when AI is operational infrastructure with real budget exposure.

Most enterprises are in the position of running a fleet of vehicles without odometers. They know fuel is cheap. They have no idea how many miles each vehicle is traveling, which routes are efficient, or which drivers are leaving the engine running overnight.

Section 6

The Only Metric That Actually Matters

There is one question that concentrates everything in this report into a single operational test:

What does it cost to run one workflow, end to end?

Not what does a token cost. Not what is the aggregate cloud bill. What does it cost, fully loaded — inference, compute, storage, egress, observability, security, engineering time — to complete one unit of the thing the AI system was built to do? One customer support resolution. One code review. One document processed. One decision made.

For most organizations today, the honest answer is: we do not know. The data is not assembled at that level. And without it, there is no foundation for optimization, no basis for accountability, and no defensible way to forecast spend as usage scales.

Architectural decisions — model routing, prompt caching, context window management, batching strategies — are not technical preferences. They are margin decisions, made by engineers, with financial consequences that most finance teams cannot currently see. The companies that lead the next phase of enterprise AI adoption will be those that build workflow-level cost visibility now, before the agentic consumption surge makes the absence of it genuinely unmanageable.

Conclusion

What This Means

The 99.7% collapse in token prices between 2023 and 2025 is one of the most significant economic events in technology history. It made AI infrastructure accessible to every company on earth. It also created a consumption surge that has overwhelmed IT budgets, driven hyperscaler capex to levels that would have been unimaginable three years ago, and exposed a gap that most organizations have not yet fully reckoned with.

Token prices are cheap. AI operations are not. The space between those two facts — the hidden cost stack, the architectural inefficiency, the absence of workflow-level visibility — is where the next competitive advantage in enterprise AI will be built or lost.

The era of AI as experimentation is over. Industrial infrastructure demands industrial-grade cost discipline. The organizations that understand their true cost per workflow — not their token price per query — will build AI businesses that compound. The ones that do not will continue subsidizing consumption they cannot measure, with budgets they cannot explain, toward outcomes they cannot attribute.

Hype builds valuation. Unit economics build companies.

This is the problem NavyaAI Private Limited was built to solve.

Cost intelligence. Workflow visibility. Real operator control.

Book a 30-minute strategy call

Free Download

Get the PDF version of this report.

Enter your work email and we’ll open the PDF immediately.