How do voice AI per-minute rates differ from telephony costs?

The voice AI rate covers ASR, LLM inference, TTS, and platform compute. Telephony (PSTN and SIP trunking) is the carrier cost of the call itself, and is usually billed separately — either passed through at cost or marked up by the AI platform.

What is a typical enterprise platform fee range?

For enterprise-grade platforms with SSO, observability, and dedicated environments, annual platform fees commonly run £50,000–£250,000 in 2026, depending on security posture, integration scope, and support tier. The fee replaces some per-minute margin, not all of it.

Should I prefer per-minute or per-resolution pricing?

Per-minute is more predictable and almost always cheaper at low containment. Per-resolution becomes attractive at higher containment and on intents where 'resolution' is contractually well defined. Below 40% containment, per-resolution is usually more expensive on the same volume.

Are LLM token costs included in the per-minute rate?

In 2026, both bundled and pass-through models exist. Bundled rates absorb the LLM cost into the per-minute number and give predictability; pass-through rates quote lower nominal numbers and expose the buyer to model-provider rate-card changes.

How important are MFN and price-protection clauses?

Material. Voice AI compute costs are falling year-on-year; without an MFN or a cap on increases, a multi-year contract locks in today's rate while the market re-prices around it. Insist on annual review with a defined adjustment basis.

What is the most-overlooked cost in voice AI procurement?

Recording and transcription retention against the buyer's actual compliance window. A seven-year retention obligation in financial services multiplies the storage line in ways most TCO models understate at signature.

Economics

Voice AI platform pricing models in 2026: the enterprise buyer's guide

VP / COO
CX directors
Procurement

By Lewis CrookPublished June 15, 2026

Bottom line up front

Voice AI pricing in 2026 is moving from 'all-in per minute' to unbundled component pricing and outcome-based resolution models. To avoid overspending, model the hidden cost stack and the contract terms — not just the unit rate. The quoted number is rarely the number you pay.

The five pricing models you'll see in 2026 RFPs

The market has converged on five recognisable structures. Most enterprise quotes are a blend of two of them.

Per-minute usage — the legacy default, almost always tiered by committed volume
Per-session or per-call — a flat fee per interaction, popular for high-volume, short, transactional triage
Per-resolution — outcome-based, charged only when a resolution-definition is met (the definition is the negotiation)
Per-seat-equivalent — priced against a notional displaced human agent, charged per concurrent capacity unit
Platform plus usage — a fixed annual platform fee for enterprise features (SSO, observability, sandboxes) plus a discounted unit rate

Per-minute pricing: what's actually included

A headline per-minute number is almost always partially unbundled in enterprise quotes. The published rate typically covers a baseline ASR, baseline TTS, and a default LLM. The line items that get added on top are what surprise procurement late in the cycle.

Premium voices and language packs — frequently a 20–30% surcharge over the baseline TTS
Higher-capability LLM models for harder reasoning — surcharged per minute or per token
Prompt-token volume — long retrieval contexts and large knowledge bases meter against a separate token allowance
Observability and analytics seats — real-time dashboards, conversation review, and QA tooling priced per user
Telephony — almost never included in the AI rate; either pass-through-at-cost or a markup
Recording storage and transcription retention — per-GB monthly and per-call retrieval pricing
Sandbox / UAT environments — many platforms meter non-production usage separately

Per-resolution pricing: the definition is the negotiation

Outcome-based pricing aligns the vendor with the business case, but the word 'resolution' is doing a lot of work in the contract. Four definitions dominate the market, each with its own loophole.

Common resolution definitions and their loopholes

Definition	What it counts	Buyer-side risk
Intent match	Caller's utterance maps to a known intent	Counts fallback intents and abandoned calls as resolved
API handoff	Backend transaction was invoked	Penalises efficient self-service that doesn't trigger an API
No-transfer	Call did not escalate to a human	Hang-ups, drop-offs, and confused callers count as resolutions
Survey-verified	Post-call survey confirms resolved	Survey response rates are low; sample is biased; volume is unpredictable
Recontact-window	No human contact on the same topic within N days	Most defensible; vendor will push for the shortest N possible

The hidden cost stack

Five line items account for most of the gap between quoted unit rate and realised cost. Together they routinely add 15–25% to the bill.

Telephony pass-through — PSTN, SIP trunking, and international termination; rarely included in the AI rate
Recording storage and retention — per-GB monthly plus per-retrieval, multiplied by your compliance retention window
SSO, SCIM, and audit-log access — frequently behind an 'Enterprise' or 'Pro' tier
Sandbox, UAT, and dedicated tenant — non-production environments metered separately
Professional services minimums — implementation, on-going customer-success retainer, and integration hours

Token and model pass-through pricing

Some platforms expose the underlying LLM cost directly and add a platform margin; others absorb it into a bundled per-minute rate. The trade-off is predictability versus exposure.

Pass-through pricing usually quotes lower nominal rates and shifts FX and model-provider rate-card risk to the buyer. If the underlying model provider re-prices or changes its tokenisation, your unit economics move with it. Build a quarterly review and a budget buffer; do not assume the rate at signature is the rate at month nine.

An honest TCO worked example

The same call volume across three pricing structures produces materially different annual bills. The example below uses one million minutes per year, an average call duration of four minutes (250,000 calls), and a 60% resolution rate where relevant.

TCO comparison — 1M minutes / 250k calls / year

Cost component	Per-minute (£0.18)	Per-resolution (£2.40)	Platform + usage
Annual platform fee	£0	£40,000	£110,000
Base usage (volume)	£180,000	£360,000 at 60% res	£70,000 at £0.07/min
Telephony pass-through	£14,000	£14,000	£14,000
Recording and retention	£6,000	£6,000	£4,000 (included tier)
Observability seats	£12,000	£12,000	included
Total annual TCO	£212,000	£432,000	£198,000

Contract terms that move the number more than the unit rate

Below the headline rate sit the levers that decide what you actually pay. Negotiate these first; negotiate the unit rate last.

Volume commits and rollovers — annualised pools beat monthly use-it-or-lose-it by a wide margin
Ramp curves — committed minimums should scale with your deployment phases, not your contract start date
Most-Favoured-Nation clauses — material in a market where compute costs are falling year-on-year
Price-protection windows — cap on annual increases, with a clear basis for any pass-through changes
Burst overage rates — confirm the per-minute rate that applies above your committed pool
Exit and data portability — extraction of prompts, voice clones, conversation logs, and tuning data, with timelines and fees defined
Sub-processor change notification — N days' notice and a right to refuse for material changes

The seven questions that flush out the real number

Send these in the RFP, not in the negotiation. The answers shape the comparison; chasing them late wastes weeks.

Is telephony (PSTN / SIP) included, pass-through at cost, or marked up — and at what rate?
What is the billing increment — per-second, six-second, or per-minute rounding?
Which LLM is included at the quoted rate, and what does upgrading to the higher-capability model cost per minute?
How is a resolution defined, and how is a non-resolution recorded — at the contract level, not the marketing level?
Is recording storage and transcription retention included for our compliance retention window?
What does multi-region or in-country processing add to the unit rate?
What are the burst overage and shortfall rates, and what is the annualisation rule?

Do this on Monday

Take last quarter's actual call volume and mix and re-cost it against your current vendor's published rate card, line by line including telephony, storage, and observability. The variance to your invoice is your negotiation surface.

Key takeaways

Five pricing structures dominate in 2026: per-minute, per-session, per-resolution, per-seat-equivalent, platform-plus-usage
Unbundled per-minute rates almost always exclude premium voices, higher-capability LLMs, telephony, observability seats, and sandboxes
Per-resolution pricing's binding term is the definition of 'resolution' — five common definitions, each with a different loophole
Contract terms (commits, ramps, MFN, exit) typically move TCO by 30–60% versus the quoted unit rate
Model TCO at your real volume, mix, and retention window — not at the vendor's reference example

Frequently asked questions

How do voice AI per-minute rates differ from telephony costs?: The voice AI rate covers ASR, LLM inference, TTS, and platform compute. Telephony (PSTN and SIP trunking) is the carrier cost of the call itself, and is usually billed separately — either passed through at cost or marked up by the AI platform.
What is a typical enterprise platform fee range?: For enterprise-grade platforms with SSO, observability, and dedicated environments, annual platform fees commonly run £50,000–£250,000 in 2026, depending on security posture, integration scope, and support tier. The fee replaces some per-minute margin, not all of it.
Should I prefer per-minute or per-resolution pricing?: Per-minute is more predictable and almost always cheaper at low containment. Per-resolution becomes attractive at higher containment and on intents where 'resolution' is contractually well defined. Below 40% containment, per-resolution is usually more expensive on the same volume.
Are LLM token costs included in the per-minute rate?: In 2026, both bundled and pass-through models exist. Bundled rates absorb the LLM cost into the per-minute number and give predictability; pass-through rates quote lower nominal numbers and expose the buyer to model-provider rate-card changes.
How important are MFN and price-protection clauses?: Material. Voice AI compute costs are falling year-on-year; without an MFN or a cap on increases, a multi-year contract locks in today's rate while the market re-prices around it. Insist on annual review with a defined adjustment basis.
What is the most-overlooked cost in voice AI procurement?: Recording and transcription retention against the buyer's actual compliance window. A seven-year retention obligation in financial services multiplies the storage line in ways most TCO models understate at signature.

Terms used in this guide

Voice AI— Voice AI is software that answers the phone, understands what the caller wants, and takes action — not just a smarter IVR.
Containment rate— Containment rate is the percentage of calls the automation finished on its own.

Last reviewed: 2026-06-15. This guide is updated when production patterns shift; see the corrections page to flag anything that no longer matches reality.

About the author

Lewis Crook

Practitioner writer on enterprise voice AI

Lewis Crook — 20 years in enterprise technology, from FTSE 100 voice deployments to over a million AI-handled minutes a month across Asia-Pacific. Buyer, builder, and now working with CX leaders on enterprise voice AI. Writes The Voice AI Brief. Connect on LinkedIn. More about Lewis.

Newsletter

Liked this? Get the next edition.

Plus the Voice AI Readiness Diagnostic in the welcome email.

Welcome email includes the Voice AI Readiness Diagnostic. No second list, no extra form.