Pure usage-based AI billing is toast

Why the next wave of AI growth runs on ARR

Jun 01, 2026

AI companies have predominantly adopted usage-based pricing and billing. According to A16z, 39% of companies charge based on usage. Hy’s SaaS & AI Pricing report 2026 found that 69% plan to switch to usage-based pricing within two years.

Billing engines from Metronome, Stripe and Charbee made it easier to bill customers based on actual consumption. But just because you can doesn’t mean you should.

Customers have come to hate the unpredictable AI usage bills. They need budget certainty. They want adoption without the risk of runaway spend. And they do not want high usage bills without clear outcomes.

Token consumption varies on top of token pricing

Tokens are still relatively new to customers, and companies are figuring out how many they need and how to control usage. On top of that, token consumption varies by vendor and model, making token prices hard to compare.

This study from March 2026 found that the cheapest AI model on the pricing page is not always the cheapest in production, because token consumption varies among models. Though Gemini 3 Flash’s listed token price is 80% cheaper than GPT-5.4’s, in their test the cost across tasks was 38% higher.

View image — **Source: Price Reversal Phenomenon: When Cheaper Reasoning Models Cost More**

Not knowing how much tokens a particular action or outcome takes for each model makes understanding token usage and total cost hard. Putting the risk of varying token consumption on the customer when models don’t deliver the expected outcome consistently (because they are not deterministic) makes this problem worse.

Customers need predictability

And companies are struggling with unexpected AI usage bills:

Uber CTO Praveen Neppalli Naga acknowledged in April 2026 that he had already spent his AI budget for the entire year, 4 months into the year:

“I’m back to the drawing board
because the budget I thought I would need is blown away already”.

Axios reported a company even spent $500 million in a single month on AI usage. They had not set usage limits, and usage went unexpectedly high.

Cursor faced a backlash after changing its Pro pricing in the summer of 2025. Their CEO Michael Truell had to publish a pricing clarification, acknowledging “we missed the mark” and offer refunds to customers who had a “surprise usage bill.”

OpenAI CEO Sam Altman said in May 2026 “customers are increasingly asking us for certainty on capacity”, and that OpenAI is now selling “discounted tokens for 1-3 year commits”. He sees it as “a big win-win” since it also helps OpenAI plan. He’s right.

AI companies need growth

He hit the nail on the head. AI vendors need predictability to plan too. They need to fund huge capital investments in the billions and justify forward-looking revenue. That’s harder if you don’t have ARR. Anthropic reported in April 2026 that they have surpassed $30 billion “run-rate revenue”, up from $9 billion at the end of 2025, which justifies their $50 billion investment in computing infrastructure.

Until now though, AI companies focused on glorifying token consumption (not recurring commitment) to fuel growth and have been great at this:

OpenAI started giving physical awards in 2025 to companies crossing 10B, 100B and 1T tokens last year. The Tokens of Appreciation program was so successful that OpenAI increased thresholds for 2026 by 10x to 100B, 1T and 10T.

That was the start of token-maxing: employees now use AI tokens to show they leverage AI more than others, that they are the ones that lead innovation.

Nvidia CEO Jensen Huang said

“If that $500,000 engineer did not consume at least $250,000 worth of tokens, I’m going to be deeply alarmed.”

A Meta employee even created a dashboard so coworkers can compete to become the company’s highest token consumer.

So…

AI vendors want usage to keep growing and are great at promoting it.
Customers need the spend to be predictable.
That tension is why pure pay-as-you-go billing will break down in B2B AI.

Subscriptions don’t have to be per seat

Usage pricing and usage billing are not the same thing. Pricing defines the logic of value, what customers pay for. Billing defines the rhythm of cash, when customers pay.

Companies and CFOs are used to SaaS subscriptions that are priced per seat and usually paid annually or monthly. If the pricing metric changes from users to usage, that does not mean billing has to become pure usage-based billing. For example

If your pricing metric is usage, contract capacity.
If your pricing metric is outcomes, contract outcome volume.
If your pricing metric is credits, sell recurring credit subscriptions.

You can still contract commitments on AI usage, capacity, tokens, credits, outcomes, or resolutions. Then bill that commitment as a subscription.

I wrote more about this distinction in Untangling billing and pricing.

AI is moving away from pure pay-as-you-go

This isn’t just me saying this might happen. The shift is already underway:

AWS sells Bedrock Provisioned Throughput alongside its on-demand pricing since September 2023. AWS has also sold EC2 Reserved Instances since at least 2009.

Microsoft started selling Azure OpenAI Service Provisioned Reservations in late Summer 2023, alongside its hourly no-commitment purchasing.

Google started selling Provisioned Throughput for its Vertex AI in June 2024.

Salesforce introduced Agentic Enterprise License Agreements (AELA) in late 2025, offering flat rate, unlimited usage pricing for its AI agents.

Zendesk has priced AI Agents based on Automated Resolutions since November 2025. Customers can choose to commit to volume at $1.50 per resolution, or pay a higher price of $2 per resolution without commitment.

OpenAI announced it is introducing Guaranteed Capacity on May 19, 2026, saying:

“Customers can choose 1-3-year commitments, with discounts that increase based on annual commitment. Guaranteed Capacity includes certainty of access to compute based on spend levels, and customers can draw down from this commitment across the portfolio of OpenAI products.”

OpenAI, Guaranteed Capacity

Even if AI pricing stays usage-based for the foreseeable future, AI billing is moving toward contractual commitments and predictable recurring revenue.

B2B contracts are a Win-Win

For customers, commitments help with:

Budget certainty: CFOs prefer predictable spend over volatile usage bills that add another risk to EBIT.
Internal planning: Teams can forecast AI spend, allocate budgets, and avoid monthly surprises.
Adoption pressure: Longer commitments create a commitment effect: “We already bought it. Let’s use it.”
Commercial leverage: Larger commitments can justify better rates, capacity guarantees, stronger support, or roadmap access.

For AI providers, they help:

ARR: Usage becomes contracted recurring revenue, instead of uncertain pay-as-you-go consumption revenue.
Better forecasting: Finance can model revenue, capacity needs, GPU commitments, and gross margin with better confidence.
Infrastructure funding: Long-term customer commitments can support long-term compute commitments.
Lower churn risk: Every renewal is a chance to reconsider competitors. Longer terms reduce evaluation frequency. And after 1 to 3 years, status quo bias works in your favor.

So what’s next

Recognize that pricing and billing are two separate decisions.

Usage-based pricing is fine. But don’t bill customers against a metric they can’t predict, at rates that vary by model, for outcomes that aren’t guaranteed. Don’t put all the risk on the customer while you capture all the upside of growing consumption, while eroding the trust you need for a long-term relationship.

Pick the pricing metric that’s right for your business. But let customers contract a commitment against it, and bill that commitment as a predictable subscription. Give them visibility and controls to manage their spend without a mid-year crisis.

That way AI vendors get ARR, and customers get predictability. And AI moves from a line item CFOs fear to a partnership that deepens over 1-3 years.