Skip to content
Compliance

PCI DSS v4.0 and voice AI: keeping cardholder data out of the model

  • Procurement / IT-Sec
  • DPOs / Privacy
  • VP / COO
By Lewis CrookPublished
Bottom line up front

The single deployment decision that determines PCI scope for voice AI is whether the cardholder PAN ever touches the LLM context window. If it does, every model provider, telephony carrier, and recording vendor in the call path is in scope and the architecture has to satisfy the full DSS v4.0 control set. Pause-and-resume DTMF, done properly, keeps the AI out of scope.

The PCI DSS v4.0 changes that matter for voice AI

PCI DSS v4.0 replaced v3.2.1 as the only effective standard from 31 March 2024, with future-dated requirements becoming mandatory from 31 March 2025. Three changes materially affect voice AI deployments: customised approach (Requirement 12 lets entities meet objectives via non-prescriptive controls if rigorously documented), expanded scope for third-party service providers (Requirement 12.8 and 12.9), and explicit treatment of cardholder data in non-traditional channels.

The headline implication for voice AI: the standard does not contemplate cardholder data flowing through an LLM. There is no exception, no carve-out, and no 'AI annex'. If a PAN reaches the model, the model provider is a Level 1 service provider and the whole call path is in audit scope.

Pause-and-resume DTMF: the only defensible pattern

The standard pattern that keeps voice AI out of PCI scope is pause-and-resume DTMF capture. The AI conducts the conversation up to the point of payment, hands the call to a PCI-scoped capture service that accepts DTMF tones (the customer types card digits on the keypad), then resumes once the transaction is authorised.

Done properly, the AI never hears the digits — neither the audio nor the transcription. The DTMF tones are intercepted upstream of the speech-to-text path, the call is masked to the AI, and the only signal the AI receives back is a transaction outcome (authorised / declined / referred). The pattern has been the standard for human-agent contact centres for over a decade; voice AI inherits it without modification.

  • DTMF capture service is a separately attested PCI Level 1 provider
  • Audio path to the AI is muted during capture — no STT, no recording
  • Tone-suppression is verified by a third-party penetration test, not vendor assertion
  • The AI receives only the transaction outcome, never the PAN, expiry, or CVV
  • Recording, where retained, is split so the masked period is unrecoverable from the retained audio

What pulls voice AI back into scope

Three deployment patterns silently put the AI back in PCI scope and most procurement teams miss at least one.

First: 'we'll redact the PAN from the transcript after the call'. Redaction after the fact does not reduce scope. The PAN was present in the LLM context at processing time. The model provider, the STT provider, and any logging pipeline are in scope from the moment the digits hit the prompt.

Second: 'the AI reads the digits back to confirm'. Read-back means the TTS provider now handles cardholder data, which expands scope to the TTS and synthesis pipeline. The defensible pattern is for the DTMF capture service to confirm out-of-band (visual confirmation in a web channel, or human-agent confirmation), never the AI.

Third: 'we capture by voice when DTMF fails'. Voice fallback for cardholder data eliminates the scope reduction entirely. If voice fallback is necessary for accessibility, route the call to a PCI-scoped human-agent flow, not to the AI.

Sub-processor and SAQ obligations

Even with pause-and-resume DTMF correctly implemented, the merchant is still responsible for documenting which parties touch cardholder data and obtaining their PCI attestation. Requirement 12.8.5 requires a maintained list of service providers, their PCI scope, and the controls they cover.

For a typical voice AI deployment, the in-scope service-provider list is the telephony carrier (always in scope — they carry the audio), the DTMF capture provider (in scope, with full Level 1 attestation), and the payment gateway. The AI platform, STT, LLM, and TTS providers are out of scope — and their out-of-scope status is documented in the network and data-flow diagrams that accompany the merchant's SAQ.

The SAQ that applies depends on the merchant's payment volume and channel mix; SAQ A or A-EP is typical for properly scoped pause-and-resume deployments. A QSA-led Report on Compliance replaces the SAQ at Level 1 merchant volumes.

Controls QSAs actually test

A QSA reviewing a voice AI deployment for PCI v4.0 will test a recurring set of controls. Knowing the list before the audit is the difference between a clean report and a remediation cycle.

Controls QSAs commonly test on voice AI deployments
Control areaWhat the QSA testsEvidence to have ready
Data-flow diagramWhether the diagram shows the AI out of scope and the DTMF path in scopeCurrent diagram, dated within 12 months, signed by an engineering lead
Network segmentationWhether the AI platform is on a network segment that cannot reach the cardholder data environmentFirewall rules, segmentation test results, penetration test report
DTMF tone suppressionWhether the AI provably cannot hear digits during captureThird-party pen test result demonstrating no PAN recoverable from AI-side audio or transcripts
Sub-processor listWhether the maintained list is current and AOCs are on fileList with last-reviewed date, AOCs for each in-scope provider, gap log for any missing
Logging and monitoringWhether the AI-side logging does not contain PAN under any error pathLog samples across success, decline, and timeout paths; redaction-test results
Change managementWhether changes to the AI platform that could affect scope go through PCI change controlChange tickets showing PCI-impact review on AI prompt and integration changes
Do this on Monday

Pull your current voice AI data-flow diagram and trace a payment call end-to-end. If you cannot show the AI is provably deaf to the DTMF digits, the deployment is in PCI scope today regardless of what your vendor claims.

Key takeaways
  • The single architecture decision that determines PCI scope is whether PAN touches the LLM context.
  • Pause-and-resume DTMF capture, properly implemented, keeps the AI out of scope.
  • Post-hoc redaction does not reduce scope — the data was present at processing time.
  • Voice fallback for cardholder data destroys the scope reduction; route to a human or specialised provider instead.
  • The QSA tests data-flow diagrams, segmentation, tone-suppression evidence, and sub-processor AOCs — have all four ready.

Frequently asked questions

Can voice AI be PCI compliant by itself?
PCI compliance is a property of a deployment, not of a vendor or product. A voice AI platform can be deployed in a PCI-compliant architecture (pause-and-resume DTMF, AI out of scope) or in a non-compliant one (PAN in LLM context). The vendor's marketing is irrelevant to your assessment.
What if our use case requires voice payment capture?
Route the voice-payment portion of the call to a human agent on a PCI-scoped path, or to a specialised voice-payment provider that holds its own Level 1 attestation. Do not capture by voice through a general-purpose AI platform — the scope expansion makes the deployment uneconomic and the regulator exposure is asymmetric.
Does call recording need to change?
Yes. The recording must be split, paused, or masked during the DTMF capture window so the retained audio does not contain recoverable cardholder data. Most enterprise recording platforms support this; the configuration has to be tested, not assumed.
What changes under v4.0 versus v3.2.1 for voice AI?
Three things: the customised approach lets you meet some objectives via non-prescriptive controls (useful for novel AI architectures, with strict documentation), service-provider obligations under 12.8 and 12.9 tightened materially, and targeted risk analysis under 12.3.1 is now required for any control where flexibility is exercised. None of this changes the core architecture decision: keep PAN out of the model.

Terms used in this guide

  • Voice AIVoice AI is software that answers the phone, understands what the caller wants, and takes action — not just a smarter IVR.
Last reviewed: 2026-06-15. This guide is updated when production patterns shift; see the corrections page to flag anything that no longer matches reality.
About the author
Lewis Crook
Practitioner writer on enterprise voice AI

Lewis Crook — 20 years in enterprise technology, from FTSE 100 voice deployments to over a million AI-handled minutes a month across Asia-Pacific. Buyer, builder, and now working with CX leaders on enterprise voice AI. Writes The Voice AI Brief. Connect on LinkedIn. More about Lewis.

Newsletter
Liked this? Get the next edition.

Plus the Voice AI Readiness Diagnostic in the welcome email.

Welcome email includes the Voice AI Readiness Diagnostic. No second list, no extra form.