Definition
What is real-time transcription?
By Lewis CrookPublished
Real-time transcription is the streaming conversion of spoken audio to text with low enough latency that downstream systems — voice AI, agent assist, supervisor dashboards, compliance flags — can act on it during the call rather than after it. It is the input layer of every voice AI system.
Real-time transcription is streaming speech-to-text fast enough to act on mid-call.
Why it matters for enterprise CX leaders
- Transcription quality sets the ceiling on everything downstream — a voice AI cannot understand what was mis-transcribed.
- Accent, code-switching, and acoustic conditions vary the effective accuracy by 10–30 percentage points between providers on the same audio.
- Real-time transcription is the operating-model team's main observability surface — every escalated call is reviewed through its transcript first.
Frequently asked questions
- Is real-time transcription the same as voice AI?
- No. Transcription is the speech-to-text input layer; voice AI uses transcription plus a language model and text-to-speech to hold a conversation. Many contact centres deploy transcription for agent assist and observability without deploying full voice AI.
- What accuracy should I expect from real-time transcription?
- Word error rates of 5–15% on clean enterprise audio in the trained language are typical; higher with strong accents, heavy code-switching, or noisy lines. Evaluate on your own recorded audio, not a vendor demo set.
- Does real-time transcription introduce compliance risk?
- Yes — transcripts are personal data and often contain special-category data. Storage, retention, and access controls for transcripts should match the controls on the underlying call recordings.
Used in
Related terms
- Voice AI— Voice AI is software that answers the phone, understands what the caller wants, and takes action — not just a smarter IVR.
- Intent recognition— Intent recognition is figuring out what the caller actually wants.
- Voice biometrics— Voice biometrics confirms who the caller is by how they speak.
Newsletter
Liked this? Get the next edition.
Plus the Voice AI Readiness Diagnostic in the welcome email.
Welcome email includes the Voice AI Readiness Diagnostic. No second list, no extra form.