How long should a voice AI RFP take to run?

Six to eight weeks from issue to shortlist for an enterprise procurement, plus a further four to six weeks for paid sandbox integration with the shortlisted two or three. Compressing below this usually means skipping evidence in favour of capability claims.

Why score independently before reconciliation?

Group scoring drifts to the median and hides disagreement. Independent scores expose where evaluators saw different things in the same answer — which is exactly where the conversation that matters happens.

What is the single most useful question in a voice AI RFP?

'Demonstrate a write to our system of record in a paid sandbox, and show what happens when that write fails.' Most failed deployments fail on this seam, not on conversation quality.

Should the RFP name pricing targets?

No. Anchoring on price up front causes vendors to game the per-minute line and recover margin in implementation services or overages. Score capability and delivery first; commercial terms last.

Procurement

Voice AI RFP template: what to actually ask, and how to score the answers

Procurement / IT-Sec
CX directors
Heads of Ops

By Lewis CrookPublished June 15, 2026

Bottom line up front

An RFP that asks 'do you support barge-in?' gets back 'yes' from every vendor on the long list. An RFP that asks 'demonstrate barge-in in a call where the caller interrupts a four-second response, and show the turn-taking latency' eliminates two thirds of them. This template is the second kind.

How to use this template

Treat the RFP as the first integration test, not a paperwork exercise. Every question below is written so the answer either ships an artifact, names a number, or is a disqualifier. Score independently before reconciling. Pre-commit the weightings — borrowed from the evaluation matrix — before any responses arrive.

Issue the RFP with the scoring sheet attached. Vendors who see how they will be scored answer the question being asked.
Require named sub-processors and a data-flow diagram per call leg with the response, not as a follow-up.
Score each section independently per evaluator; reconcile only after all scores are in.
Treat the two gateways (data sovereignty, regulatory compliance) as pass/fail. A vendor that fails either is out, regardless of total score.
Hold the shortlisted two or three to a paid sandbox integration before contract. The RFP is a filter; the sandbox is the decision.

Section A — Company, references, delivery (weight: 15%)

The most predictive section and the one most often skipped. Implementation delivery capability is invisible in capability claims.

Name three customers at our scale (±50%) in our region and industry. We will call them directly, not the references you nominate.
Name the delivery team that would run our implementation — by individual, with relevant tenure and prior projects of comparable shape.
Provide the escalation path with names and SLAs at the engineering, product, and executive levels.
Describe two implementations that went poorly. What broke, what changed in your delivery model as a result.
Provide your written change-of-control policy. What happens to our deployment if you are acquired in year two.

Section B — Data sovereignty and security (weight: 20%, gateway)

Sovereignty is whose laws can reach the data, not where it is stored. Vendors who cannot answer cleanly are filtered here, not later.

Provide a data-flow diagram per call leg: capture, speech-to-text, retrieval, inference, text-to-speech, telephony, storage. Name the legal entity and jurisdiction operating each.
Name every sub-processor with the function performed and the country of processing. State your change-notification SLA when sub-processors change.
Provide SOC 2 Type II, ISO 27001, and (if relevant) HITRUST. State the audit period and any qualifications.
Provide your DPA, including international transfer mechanism, retention defaults, deletion SLAs, and the exit-and-destruction plan.
State whether models are trained on customer data by default. State the opt-out mechanism and how it is enforced technically.

Section C — Regulatory compliance (weight: 10%, gateway)

Jurisdiction-specific to where care or service is delivered and where the caller sits. A vendor quoting the wrong regime at you is a tell.

Provide the jurisdiction-specific compliance assessment for our actual market — not a global retrofit.
State your position on the EU AI Act AI-disclosure obligation (in force 2026) and your roadmap for the high-risk obligations under the 2026 amendment.
Provide evidence of consent capture for call recording in each jurisdiction we operate.
State your position on medical-device classification (healthcare) or FCA / NAIC / regional financial regulator scope (financial services).

Section D — Integration depth (weight: 20%)

Read-only against generic connectors is table stakes. Write, idempotency, failure handling, and audit are where deployments survive.

Demonstrate a write to our actual system of record in a paid sandbox, with the auth pattern documented. Show what happens when that write fails.
List the contact-centre platforms, CRMs, IVRs, identity providers, and ticketing systems with production integrations, not roadmap.
Describe your idempotency model for actions that touch downstream systems.
Provide the audit format for every write. We will need it for our SOX / regulatory audit.

Section E — Operating model and control surface (weight: 15%)

The conversation owner — a senior contact-centre operator, not an engineer — is the highest-leverage role. Vendors that lock changes behind engineering tickets fail in production.

Demonstrate a non-engineer changing an intent, deploying to staging, and rolling back in under one hour.
Provide the audit log for the last ten changes to a customer deployment, by author and revert path.
Describe the staging-to-production promotion model, including diff review and approval gates.
Describe the per-call observability available to the conversation owner: transcript, intent labels, tool calls, latency per step, escalation reason.

Section F — Performance and latency (weight: 10%)

End-to-end turn latency above 1.5 seconds reads as a broken connection. The number to ask for is p95 under realistic load, not mean in a demo.

State the p95 end-to-end turn-taking latency under 20 parallel sessions, using a script we provide.
Demonstrate graceful barge-in: the caller interrupts a four-second response, and the system yields without losing turn context.
Provide the latency budget per step (ASR, retrieval, LLM, TTS, telephony) for a representative production call.
Describe your behaviour under degraded LLM provider performance — failover, graceful degradation, caller-facing experience.

Section G — Commercial (weight: 10%)

Price last, after everything else. Model the deployment at 0.5x and 2x our forecast volume and containment.

Provide pricing under two scenarios: 50% of our forecast call volume and 200% of it. State which lines move and by how much.
State the per-minute floor on escalated calls and the rules for the AI-exposure minutes before transfer.
Provide the exit terms: data export format, timeline, proof-of-destruction, and any disengagement fee.
State the price-protection mechanism on contract renewal and the conditions under which it lapses.

Scoring sheet

Each evaluator scores independently per section on the 1 / 3 / 5 scale below, then multiplies by the section weight. Reconciliation happens after all scores are recorded. The two gateway sections (B, C) are also pass / fail — a fail removes the vendor regardless of total score.

Section weights and pass marks

Section	Weight	Score 1	Score 3	Score 5	Gateway?
A. Company / delivery	15%	No named team, references hand-picked	Named team, two strong references, one weak	Named team with comparable prior work, three independent strong references	No
B. Data sovereignty / security	20%	Generic SOC 2, sub-processors not named	Documented residency, sub-processors named, DPA acceptable	Per-call-leg residency proof, sub-processor change-notification SLA, exit-and-destruction plan survives legal review	Yes — fail = out
C. Regulatory compliance	10%	Global retrofit, wrong regime quoted	Jurisdiction-specific assessment, basic AI Act position	Jurisdiction-specific, AI Act roadmap, evidence of consent capture per jurisdiction	Yes — fail = out
D. Integration depth	20%	Read-only, write is roadmap	Read and write demonstrated, auth documented	Sandbox write + failure path + idempotency + audit, demonstrated against our system	No
E. Operating model	15%	Engineering owns every change	Non-engineer can change intents in a controlled editor	Non-engineer ships and rolls back in under an hour with diff review and audit	No
F. Performance / latency	10%	p95 above 2.0s under load, no barge-in	p95 1.2–1.8s, basic barge-in	p95 under 1.0s, graceful barge-in, full latency budget per step	No
G. Commercial	10%	Per-minute, surprise overage clauses	Per-minute with bands, minute-floor stated	Per-resolution or platform fee with documented behaviour at 2x volume and clean exit terms	No

Disqualifying answers

Short list of answers that should remove a vendor regardless of section score. These are the patterns that, in practice, predict a failed implementation.

Cannot name where speech-to-text inference physically runs, or hedges on sub-processors.
Refuses to do a paid sandbox integration against the customer's actual system of record before contract.
Quotes the wrong regulatory regime (HIPAA at a UK buyer, GDPR generics at a US healthcare network).
Cannot produce the audit log for the last ten changes to any customer deployment.
References are all the vendor's nominated contacts; will not allow independent outreach.
Pricing model contains an unbounded overage clause without a renegotiation trigger.

Do this on Monday

Pull your current vendor long list. For each, write the single question from Section B (data sovereignty) and Section D (sandbox write) you have not yet asked. Send those two questions today.

Key takeaways

An RFP that asks for evidence eliminates two thirds of the long list; one that asks for capability claims does not.
Treat data sovereignty and regulatory compliance as pass / fail gateways, not weighted dimensions.
The most predictive section is delivery capability — named team, independent references, two implementations that went poorly.
Require a paid sandbox integration with the shortlisted vendors before contract; what they will not demonstrate before signing, they usually cannot deliver after.
Score independently, reconcile after, pre-commit the weightings before any responses arrive.

Frequently asked questions

How long should a voice AI RFP take to run?: Six to eight weeks from issue to shortlist for an enterprise procurement, plus a further four to six weeks for paid sandbox integration with the shortlisted two or three. Compressing below this usually means skipping evidence in favour of capability claims.
Why score independently before reconciliation?: Group scoring drifts to the median and hides disagreement. Independent scores expose where evaluators saw different things in the same answer — which is exactly where the conversation that matters happens.
What is the single most useful question in a voice AI RFP?: 'Demonstrate a write to our system of record in a paid sandbox, and show what happens when that write fails.' Most failed deployments fail on this seam, not on conversation quality.
Should the RFP name pricing targets?: No. Anchoring on price up front causes vendors to game the per-minute line and recover margin in implementation services or overages. Score capability and delivery first; commercial terms last.

Terms used in this guide

Voice AI— Voice AI is software that answers the phone, understands what the caller wants, and takes action — not just a smarter IVR.
IVR replacement— IVR replacement swaps menus and keypad input for natural conversation and actual resolution.
Containment rate— Containment rate is the percentage of calls the automation finished on its own.

Last reviewed: 2026-06-15. This guide is updated when production patterns shift; see the corrections page to flag anything that no longer matches reality.

About the author

Lewis Crook

Practitioner writer on enterprise voice AI

Lewis Crook — 20 years in enterprise technology, from FTSE 100 voice deployments to over a million AI-handled minutes a month across Asia-Pacific. Buyer, builder, and now working with CX leaders on enterprise voice AI. Writes The Voice AI Brief. Connect on LinkedIn. More about Lewis.

Newsletter

Liked this? Get the next edition.

Plus the Voice AI Readiness Diagnostic in the welcome email.

Welcome email includes the Voice AI Readiness Diagnostic. No second list, no extra form.