Skip to content
Security

Voice AI security and compliance: the enterprise buyer's checklist

  • Procurement / IT-Sec
  • VP / COO
  • CX directors
By Lewis CrookPublished
Bottom line up front

Voice AI security is not a model problem — it is a data-flow problem. The questions that decide whether a deployment is approvable concern where audio, transcripts, and PII travel; what the model provider retains; how recording consent is captured; and whether the deployment survives a regulator's data-flow diagram.

The four data flows that decide approval

Most security reviews collapse into four flows. Get clean answers on each, in writing, before procurement closes.

  • Audio in transit — codec, encryption, routing path, geographic transit
  • Audio at rest — recording storage, retention window, encryption at rest, deletion guarantee
  • Transcript and prompt — where it is stored, who can see it, whether it is used for model training
  • PII in tool calls — what is sent to systems of record, what is masked or tokenised before reaching the model

The compliance regimes that show up most often

Different regimes care about different parts of the stack; a single "compliance" answer rarely covers them all.

  • PCI DSS — applies the moment card data enters the audio stream; usually requires pause-and-resume or DTMF capture so the model never hears the digits
  • HIPAA — applies to PHI in healthcare contexts; requires BAAs across every data-handling vendor including model providers and recording storage
  • GDPR / UK GDPR — lawful basis, recording consent, data subject rights, transfer mechanisms outside the UK/EEA, DPIA artefacts
  • FCA / financial-services rules — call recording retention, vulnerable-customer handling, fair treatment evidence
  • Sector-specific — telco lawful intercept, insurance complaint logging, public-sector accessibility duties

Data residency — the question that breaks most pilots

Many voice AI platforms advertise multi-region deployment but route inference, fine-tuning, or evaluation through a single region. UK and EU buyers should ask, in writing: where is each of speech-to-text, language-model inference, text-to-speech, and recording storage physically processed and stored, for every call. "Available in EU" is not the same as "runs in EU end-to-end."

A voice AI announcement is not automatically a recording disclosure. Many jurisdictions require both — that the caller is informed they are speaking with an automated system and, separately, that the call is recorded. Conflating the two is the most common consent failure surfaced by post-deployment audits.

Model-provider data handling

The single highest-leverage clause to negotiate is whether the underlying model provider retains, logs, or trains on the audio, transcripts, or tool-call payloads. Default settings on hosted model APIs frequently allow some form of retention; enterprise tenancies typically do not. The voice AI platform usually controls this — but only if the buyer asks.

The questions that catch over-claims

Three questions consistently expose marketing gaps: show me a data-flow diagram for one complete call including every third party; show me where PCI-relevant data is masked, and what proves it; show me the retention and deletion policy for audio, transcript, and tool-call logs separately. If any of those answers is verbal-only, the deployment is not approvable.

The compliance gates that decide vendor shortlists in BFSI and healthcare

Three gate criteria reliably decide which vendors make it past a regulated buyer's first round. They are not weighted dimensions; failing any one removes the vendor from consideration.

  • Data residency that matches the regulatory jurisdiction, demonstrable per call
  • Recording and consent handling that survives an audit, including jurisdictional variation
  • DPIA (Data Protection Impact Assessment) and DPA (Data Processing Agreement) support that the procurement and legal teams can sign without exception

PII handling on the voice path — the patterns that work

Voice AI handles more sensitive data per turn than almost any other enterprise system. Three patterns recur in defensible implementations.

  • Tokenisation at the speech-to-text boundary — sensitive fields are extracted and replaced with tokens before the transcript hits the language model
  • Redaction at the storage boundary — transcripts retained for review have sensitive fields removed at write time, not retrospectively
  • Just-in-time decryption for the language model — sensitive data is decrypted in the context of a single turn and not retained in conversation state

PCI on the voice path

Payment card data on a voice call is the most-regulated piece of data the system will handle. The defensible pattern is a dedicated capture surface — DTMF-based card entry routed through a separate, PCI-scoped path — rather than allowing card numbers into the speech-to-text stream at all. Some platforms now support voice-based card capture with full PCI scope reduction; verifying the QSA-attested implementation in detail is non-negotiable before going live.

Call recording is the operational backbone of voice AI — without it, the conversation owner has nothing to review and the regulator has nothing to audit. The implementation needs to respect three constraints simultaneously: jurisdictional consent requirements (one-party in some jurisdictions, two-party in others), the right to erasure under GDPR and UK GDPR, and the retention requirements set by sector regulators. A single global recording rule rarely satisfies all three.

Model and prompt governance

Two governance questions catch the gaps that operational security reviews usually miss. First, who can change a system prompt and what audit trail does that change leave? Second, what happens when the underlying model is updated by the vendor — is there a known evaluation gate, or does the change ship silently? Both questions are increasingly relevant as model providers roll out updates that change behaviour materially. A platform without answers to both is a platform that will, eventually, surface an unexplained behaviour change in production.

Vendor due diligence questions worth asking

Six questions consistently separate vendors that will pass a regulated buyer's procurement review from those that will not.

  • Where exactly is data processed and stored, per call leg, per region?
  • Which sub-processors are in the data path, and what is the change-notification process?
  • How is the underlying language model isolated from training on customer data?
  • What is the SOC 2 / ISO 27001 / PCI scope, and does the attestation cover the specific deployment topology being sold?
  • What is the breach-notification SLA and the historical record of incidents?
  • What is the exit plan — how is data returned and how is it provably destroyed at contract end?

DPA non-negotiables — the clauses worth holding the line on

Most DPA negotiations conclude on the same handful of clauses. The table below names the ones that materially change risk exposure rather than legal hygiene, with the defensible customer position next to the standard vendor opener.

DPA non-negotiables
ClauseDefault vendor positionDefensible customer position
Sub-processor change30 days, no objection right30 days + objection right + termination-for-convenience + transition support
Model training on customer dataPermitted unless opted outProhibited unless opted in
Audit rightsAnnual, vendor-coordinated, no on-siteAnnual on-site OR independent auditor report; for-cause audit on incident
Breach notificationFrom disclosure decision, 72hFrom confirmation, 24h first notice, 72h detail
Data export on terminationStandard format, no SLANamed format, 30-day SLA, proof of destruction within 90 days
Cross-border transfer mechanismSCCs by referenceSCCs attached, current version, plus transfer impact assessment
Liability cap on data incident12 months fees, mutualSuper-cap (2–3x) for data breach and security incidents

Per-call-leg residency — the diagram every regulated buyer needs

Headline residency is a marketing claim. Per-call-leg residency is a procurement artifact. The diagram should show, for a single live call, the legal entity and jurisdiction processing each of: PSTN ingress, capture, ASR, retrieval, LLM inference, TTS, egress, recording storage, transcript storage, derived analytics.

The most common gap: ASR or LLM inference routed to a US-based provider while the rest of the stack sits in-region. Both can be true simultaneously and both are usually disclosed if you ask the right question — but only the per-leg diagram surfaces it cleanly. Make it a contractual obligation that the diagram is current at all times, with a change-notification SLA that mirrors sub-processors.

Key takeaways
  • Voice AI security is a data-flow problem, not a model problem.
  • Four flows decide approval: audio in transit, audio at rest, transcript/prompt handling, PII in tool calls.
  • PCI usually requires pause-and-resume or DTMF capture so the model never hears card digits.
  • Data residency claims are routinely overstated — get a per-component, per-call written data-flow.
  • Automated-system disclosure is not the same as recording consent — most jurisdictions require both.

Frequently asked questions

Is voice AI PCI compliant?
The platform is not — the deployment is. PCI compliance depends on whether card data ever enters the audio stream the model hears. The standard pattern is pause-and-resume or DTMF capture, with the digits routed to a PCI-scoped service the model never sees.
Is voice AI GDPR compliant?
GDPR compliance depends on lawful basis, transfer mechanism, retention, and consent — none of which the platform decides on its own. Treat "GDPR-compliant" as a starting position, then walk through the data-flow per use case.
Where is voice AI data processed?
It varies by platform and by call. Ask for a written, per-component data-flow: speech-to-text, language-model inference, text-to-speech, recording. "Available in your region" is not the same as "runs in your region."
Does the model provider retain my call data?
By default, often yes — many hosted model APIs retain inputs for abuse monitoring or evaluation. Enterprise tenancies typically allow zero retention, but only if the voice AI platform passes the right flags. Confirm in writing.
What does a defensible recording-consent script look like?
Two distinct disclosures: that the caller is interacting with an automated system, and that the call is being recorded — with the lawful basis, retention, and opt-out path. Conflating the two is the most common audit finding.
Do I need a DPIA for a voice AI deployment?
In UK/EU contexts, almost always yes. Voice AI involves automated processing of personal data, often at scale, often touching special-category data. A DPIA is the cheapest insurance you will buy on the programme.

Terms used in this guide

  • Voice AIVoice AI is software that answers the phone, understands what the caller wants, and takes action — not just a smarter IVR.
  • IVR replacementIVR replacement swaps menus and keypad input for natural conversation and actual resolution.
  • DTMF fallbackDTMF fallback uses the keypad to capture digits the model is not allowed to hear.
  • Voice biometricsVoice biometrics confirms who the caller is by how they speak.
  • Real-time transcriptionReal-time transcription is streaming speech-to-text fast enough to act on mid-call.
Last reviewed: 2026-06-15. This guide is updated when production patterns shift; see the corrections page to flag anything that no longer matches reality.
About the author
Lewis Crook
Practitioner writer on enterprise voice AI

Lewis Crook — 20 years in enterprise technology, from FTSE 100 voice deployments to over a million AI-handled minutes a month across Asia-Pacific. Buyer, builder, and now working with CX leaders on enterprise voice AI. Writes The Voice AI Brief. Connect on LinkedIn. More about Lewis.

Newsletter
Liked this? Get the next edition.

Plus the Voice AI Readiness Diagnostic in the welcome email.

Welcome email includes the Voice AI Readiness Diagnostic. No second list, no extra form.