Voice AI RFP template: what to actually ask, and how to score the answers
- Procurement / IT-Sec
- CX directors
- Heads of Ops
An RFP that asks 'do you support barge-in?' gets back 'yes' from every vendor on the long list. An RFP that asks 'demonstrate barge-in in a call where the caller interrupts a four-second response, and show the turn-taking latency' eliminates two thirds of them. This template is the second kind.
How to use this template
Treat the RFP as the first integration test, not a paperwork exercise. Every question below is written so the answer either ships an artifact, names a number, or is a disqualifier. Score independently before reconciling. Pre-commit the weightings — borrowed from the evaluation matrix — before any responses arrive.
- Issue the RFP with the scoring sheet attached. Vendors who see how they will be scored answer the question being asked.
- Require named sub-processors and a data-flow diagram per call leg with the response, not as a follow-up.
- Score each section independently per evaluator; reconcile only after all scores are in.
- Treat the two gateways (data sovereignty, regulatory compliance) as pass/fail. A vendor that fails either is out, regardless of total score.
- Hold the shortlisted two or three to a paid sandbox integration before contract. The RFP is a filter; the sandbox is the decision.
Section A — Company, references, delivery (weight: 15%)
The most predictive section and the one most often skipped. Implementation delivery capability is invisible in capability claims.
- Name three customers at our scale (±50%) in our region and industry. We will call them directly, not the references you nominate.
- Name the delivery team that would run our implementation — by individual, with relevant tenure and prior projects of comparable shape.
- Provide the escalation path with names and SLAs at the engineering, product, and executive levels.
- Describe two implementations that went poorly. What broke, what changed in your delivery model as a result.
- Provide your written change-of-control policy. What happens to our deployment if you are acquired in year two.
Section B — Data sovereignty and security (weight: 20%, gateway)
Sovereignty is whose laws can reach the data, not where it is stored. Vendors who cannot answer cleanly are filtered here, not later.
- Provide a data-flow diagram per call leg: capture, speech-to-text, retrieval, inference, text-to-speech, telephony, storage. Name the legal entity and jurisdiction operating each.
- Name every sub-processor with the function performed and the country of processing. State your change-notification SLA when sub-processors change.
- Provide SOC 2 Type II, ISO 27001, and (if relevant) HITRUST. State the audit period and any qualifications.
- Provide your DPA, including international transfer mechanism, retention defaults, deletion SLAs, and the exit-and-destruction plan.
- State whether models are trained on customer data by default. State the opt-out mechanism and how it is enforced technically.
Section C — Regulatory compliance (weight: 10%, gateway)
Jurisdiction-specific to where care or service is delivered and where the caller sits. A vendor quoting the wrong regime at you is a tell.
- Provide the jurisdiction-specific compliance assessment for our actual market — not a global retrofit.
- State your position on the EU AI Act AI-disclosure obligation (in force 2026) and your roadmap for the high-risk obligations under the 2026 amendment.
- Provide evidence of consent capture for call recording in each jurisdiction we operate.
- State your position on medical-device classification (healthcare) or FCA / NAIC / regional financial regulator scope (financial services).
Section D — Integration depth (weight: 20%)
Read-only against generic connectors is table stakes. Write, idempotency, failure handling, and audit are where deployments survive.
- Demonstrate a write to our actual system of record in a paid sandbox, with the auth pattern documented. Show what happens when that write fails.
- List the contact-centre platforms, CRMs, IVRs, identity providers, and ticketing systems with production integrations, not roadmap.
- Describe your idempotency model for actions that touch downstream systems.
- Provide the audit format for every write. We will need it for our SOX / regulatory audit.
Section E — Operating model and control surface (weight: 15%)
The conversation owner — a senior contact-centre operator, not an engineer — is the highest-leverage role. Vendors that lock changes behind engineering tickets fail in production.
- Demonstrate a non-engineer changing an intent, deploying to staging, and rolling back in under one hour.
- Provide the audit log for the last ten changes to a customer deployment, by author and revert path.
- Describe the staging-to-production promotion model, including diff review and approval gates.
- Describe the per-call observability available to the conversation owner: transcript, intent labels, tool calls, latency per step, escalation reason.
Section F — Performance and latency (weight: 10%)
End-to-end turn latency above 1.5 seconds reads as a broken connection. The number to ask for is p95 under realistic load, not mean in a demo.
- State the p95 end-to-end turn-taking latency under 20 parallel sessions, using a script we provide.
- Demonstrate graceful barge-in: the caller interrupts a four-second response, and the system yields without losing turn context.
- Provide the latency budget per step (ASR, retrieval, LLM, TTS, telephony) for a representative production call.
- Describe your behaviour under degraded LLM provider performance — failover, graceful degradation, caller-facing experience.
Section G — Commercial (weight: 10%)
Price last, after everything else. Model the deployment at 0.5x and 2x our forecast volume and containment.
- Provide pricing under two scenarios: 50% of our forecast call volume and 200% of it. State which lines move and by how much.
- State the per-minute floor on escalated calls and the rules for the AI-exposure minutes before transfer.
- Provide the exit terms: data export format, timeline, proof-of-destruction, and any disengagement fee.
- State the price-protection mechanism on contract renewal and the conditions under which it lapses.
Scoring sheet
Each evaluator scores independently per section on the 1 / 3 / 5 scale below, then multiplies by the section weight. Reconciliation happens after all scores are recorded. The two gateway sections (B, C) are also pass / fail — a fail removes the vendor regardless of total score.
| Section | Weight | Score 1 | Score 3 | Score 5 | Gateway? |
|---|---|---|---|---|---|
| A. Company / delivery | 15% | No named team, references hand-picked | Named team, two strong references, one weak | Named team with comparable prior work, three independent strong references | No |
| B. Data sovereignty / security | 20% | Generic SOC 2, sub-processors not named | Documented residency, sub-processors named, DPA acceptable | Per-call-leg residency proof, sub-processor change-notification SLA, exit-and-destruction plan survives legal review | Yes — fail = out |
| C. Regulatory compliance | 10% | Global retrofit, wrong regime quoted | Jurisdiction-specific assessment, basic AI Act position | Jurisdiction-specific, AI Act roadmap, evidence of consent capture per jurisdiction | Yes — fail = out |
| D. Integration depth | 20% | Read-only, write is roadmap | Read and write demonstrated, auth documented | Sandbox write + failure path + idempotency + audit, demonstrated against our system | No |
| E. Operating model | 15% | Engineering owns every change | Non-engineer can change intents in a controlled editor | Non-engineer ships and rolls back in under an hour with diff review and audit | No |
| F. Performance / latency | 10% | p95 above 2.0s under load, no barge-in | p95 1.2–1.8s, basic barge-in | p95 under 1.0s, graceful barge-in, full latency budget per step | No |
| G. Commercial | 10% | Per-minute, surprise overage clauses | Per-minute with bands, minute-floor stated | Per-resolution or platform fee with documented behaviour at 2x volume and clean exit terms | No |
Disqualifying answers
Short list of answers that should remove a vendor regardless of section score. These are the patterns that, in practice, predict a failed implementation.
- Cannot name where speech-to-text inference physically runs, or hedges on sub-processors.
- Refuses to do a paid sandbox integration against the customer's actual system of record before contract.
- Quotes the wrong regulatory regime (HIPAA at a UK buyer, GDPR generics at a US healthcare network).
- Cannot produce the audit log for the last ten changes to any customer deployment.
- References are all the vendor's nominated contacts; will not allow independent outreach.
- Pricing model contains an unbounded overage clause without a renegotiation trigger.
Pull your current vendor long list. For each, write the single question from Section B (data sovereignty) and Section D (sandbox write) you have not yet asked. Send those two questions today.
- An RFP that asks for evidence eliminates two thirds of the long list; one that asks for capability claims does not.
- Treat data sovereignty and regulatory compliance as pass / fail gateways, not weighted dimensions.
- The most predictive section is delivery capability — named team, independent references, two implementations that went poorly.
- Require a paid sandbox integration with the shortlisted vendors before contract; what they will not demonstrate before signing, they usually cannot deliver after.
- Score independently, reconcile after, pre-commit the weightings before any responses arrive.
Frequently asked questions
- How long should a voice AI RFP take to run?
- Six to eight weeks from issue to shortlist for an enterprise procurement, plus a further four to six weeks for paid sandbox integration with the shortlisted two or three. Compressing below this usually means skipping evidence in favour of capability claims.
- Why score independently before reconciliation?
- Group scoring drifts to the median and hides disagreement. Independent scores expose where evaluators saw different things in the same answer — which is exactly where the conversation that matters happens.
- What is the single most useful question in a voice AI RFP?
- 'Demonstrate a write to our system of record in a paid sandbox, and show what happens when that write fails.' Most failed deployments fail on this seam, not on conversation quality.
- Should the RFP name pricing targets?
- No. Anchoring on price up front causes vendors to game the per-minute line and recover margin in implementation services or overages. Score capability and delivery first; commercial terms last.
Terms used in this guide
- Voice AI— Voice AI is software that answers the phone, understands what the caller wants, and takes action — not just a smarter IVR.
- IVR replacement— IVR replacement swaps menus and keypad input for natural conversation and actual resolution.
- Containment rate— Containment rate is the percentage of calls the automation finished on its own.
Lewis Crook — 20 years in enterprise technology, from FTSE 100 voice deployments to over a million AI-handled minutes a month across Asia-Pacific. Buyer, builder, and now working with CX leaders on enterprise voice AI. Writes The Voice AI Brief. Connect on LinkedIn. More about Lewis.
Related guides
Plus the Voice AI Readiness Diagnostic in the welcome email.
Welcome email includes the Voice AI Readiness Diagnostic. No second list, no extra form.