Enterprise voice AI integration depth: a real evaluation checklist
- Heads of Ops
- Procurement / IT-Sec
- CX directors
Integration depth is the single biggest predictor of whether a voice AI can actually resolve calls. "We integrate with Salesforce" is not a meaningful claim until you can see what the platform reads, what it writes, and how it handles failure.
Read vs write — the distinction that matters
Most platforms can read from a CRM. Far fewer can write to one safely, with the right authentication, idempotency, and audit trail. Resolution — actually doing the thing the customer called about — requires write access. A read-only deployment can answer questions; it cannot fix problems.
The integration checklist
- Identification and authentication — can the platform verify the caller against your identity provider, not just match a phone number?
- Read access — to customer record, account state, recent interactions, scheduled events
- Write access — create cases, update preferences, schedule callbacks, process payments where in scope
- Idempotency — can a retried call avoid double-charging or double-booking?
- Failure handling — when an integration call fails, does the agent degrade gracefully or hallucinate?
- Observability — is every integration call logged with request, response, latency, and outcome?
- Compliance — PCI, HIPAA, GDPR/UK GDPR handling on every read and write path
- Latency budget — integrations on the critical path should resolve within the conversational latency budget
The questions that catch over-claims
Three questions usually expose the gap between marketing and capability: show me a call where the AI created a record in our system of record; show me what happens when that write fails; show me how a conversation owner inspects that failure the next morning. If the answers require an engineer, the operating model will not scale.
The read-access checklist, item by item
A useful read-access evaluation goes past the connector list and checks how the platform actually behaves under the constraints of a real call. Five questions usually catch the gap between connector availability and useful read capability.
- Can the platform read by verified identity, not just by inbound phone number or stated identifier?
- Can it combine reads from multiple systems within a single turn while staying inside the latency budget?
- Does it cache safely between turns of the same call, with a defined invalidation rule?
- Does it handle missing or partial records without hallucinating completeness?
- Does every read leave an auditable record — request, response, latency, system, caller — accessible to the conversation owner without engineering involvement?
The write-access checklist, with the failure modes that matter
Write access is where most platforms quietly fall behind. The right evaluation is not whether a write is possible but whether it is safe under the conditions a production deployment will encounter.
- Authentication strength on the write path matches the regulatory requirement for the action
- Every write is idempotent against retries — a dropped call mid-write does not create a duplicate
- Concurrent writes from a parallel channel (web, app, agent desktop) do not race the AI write
- Failed writes are surfaced to the caller as a graceful recovery turn, not as a hallucinated success
- Every write is logged with caller, intent, payload, response, and outcome for audit
Identity and authentication — the gap most platforms hide
Most platforms can match a phone number. Far fewer can authenticate against an enterprise identity provider with the strength required for high-value actions. The gap defines what intents the deployment can realistically cover. A platform that cannot perform strong customer authentication is effectively limited to look-up and low-risk transactional intents — which usually undermines the business case the platform was bought on.
During evaluation, demand a working integration against the actual identity provider, not a generic OIDC sample. The gap between the two is often where the proof of value silently breaks.
Failure handling in detail — graceful degradation patterns
Every integration on the critical path will fail at some point. The defining question is what the agent does when it does. Four patterns separate production-grade implementations from demoware.
- Time-bounded retries with a documented timeout, not infinite waits
- Fall-through to a partial response — "I can confirm the booking but our payment system is slow this morning, may I send a link?" — not a hallucinated completion
- Escalation paths that carry context to the human, not blind transfers
- Telemetry on every failure mode so the operating-model team can quantify them weekly
The integration-test plan worth running
A short, repeatable integration test plan run once a week catches most production drift before it shows up in containment metrics. The plan does not need to be complex: five synthetic calls per critical intent, run end-to-end against the staging system of record, with deliberate failure injection on one in five. Pass/fail is binary and the results live in the weekly review.
Teams that run this consistently catch integration regressions days before they affect customers. Teams that do not catch them in the weekly containment report — by which point a few thousand callers have already been affected.
The printable integration-depth checklist
A single consolidated checklist designed to be printed and used in an integration workshop with the vendor. Tick only the items the vendor has demonstrated against your actual systems — not against a generic connector.
- Read by verified identity, not by inbound phone number
- Multi-system read inside a single turn within the latency budget
- Safe inter-turn caching with defined invalidation rule
- Graceful handling of missing or partial records — no completeness hallucination
- Auditable read trail per request, accessible to the conversation owner
- Write authentication strength matches the regulatory bar for the action
- Every write idempotent against retries and dropped calls
- Concurrent writes from web / app / agent desktop do not race
- Failed writes surface as graceful recovery, never hallucinated success
- Every write logged with caller, intent, payload, response, outcome
- Strong customer authentication against the actual enterprise IdP, not generic OIDC
- Time-bounded retries with documented timeouts, not infinite waits
- Partial-response fall-through patterns documented per critical intent
- Context-carrying escalation to human agents, not blind transfer
- Per-failure-mode telemetry surfaced in the weekly operating review
- Read access is common; write access — what actually resolves calls — is much rarer.
- Identity and authentication is the most common integration gap, limiting deployments to low-risk intents.
- Idempotency and failure handling decide whether a retried call double-charges or hallucinates.
- Every integration call on the critical path eats into a sub-1.5-second latency budget.
- If inspecting a failed integration requires an engineer, the operating model will not scale.
Frequently asked questions
- Why does write access matter so much for voice AI?
- Because resolution requires action, not just answers. A platform that can read a customer's balance but not process their payment, schedule their callback, or update their preference is offering self-service lookup, not call resolution.
- What is the most common integration gap in enterprise voice AI?
- Identity and authentication. Many platforms can match a phone number but cannot perform strong customer authentication against the enterprise identity provider — which limits the deployment to low-risk intents.
- How important is integration latency?
- Critical. Each integration call on the critical path eats into the conversational latency budget — typically under 1.5 seconds end-to-end before perceived quality drops. Slow integrations force either a degraded experience or a narrower scope.
Terms used in this guide
- Voice AI— Voice AI is software that answers the phone, understands what the caller wants, and takes action — not just a smarter IVR.
- Voice AI latency— Voice AI latency is the gap before the system starts talking back.
- IVR replacement— IVR replacement swaps menus and keypad input for natural conversation and actual resolution.
Lewis Crook — 20 years in enterprise technology, from FTSE 100 voice deployments to over a million AI-handled minutes a month across Asia-Pacific. Buyer, builder, and now working with CX leaders on enterprise voice AI. Writes The Voice AI Brief. Connect on LinkedIn. More about Lewis.
Field notes
Short, opinionated takes from practice that sit underneath this guide.
- The integration tax nobody prices in
Voice AI business cases routinely treat integration as a phase-two implementation detail. In production, integration depth is the single biggest predictor of whether the deployment can actually resolve calls.
Related guides
Plus the Voice AI Readiness Diagnostic in the welcome email.
Welcome email includes the Voice AI Readiness Diagnostic. No second list, no extra form.