Definition
What is turn-taking latency?
By Lewis CrookPublished
Turn-taking latency is the delay between the caller finishing speaking and the voice AI recognising the turn has ended and beginning to respond. It combines end-of-turn detection, speech-to-text finalisation, language model inference, and text-to-speech start time. It is the most-felt component of perceived voice AI quality.
Turn-taking latency is the awkward pause before the bot starts talking back.
Why it matters for enterprise CX leaders
- Human conversational expectation tolerates roughly 800 ms between turns; above 2 seconds, the interaction feels broken.
- End-of-turn detection — deciding the caller has finished — is often the largest single contributor and the hardest to tune.
- Reducing turn-taking latency by a few hundred milliseconds is consistently the single highest-leverage UX improvement in voice AI deployments.
Frequently asked questions
- What is a good turn-taking latency?
- Under 1 second is achievable with streaming stacks and disciplined integration design. Under 1.5 seconds is the practical production target. Above 2 seconds, callers notice and start to disengage.
- Is turn-taking latency the same as voice AI latency?
- Closely related. Voice AI latency usually refers to the full end-to-end delay; turn-taking latency specifically isolates the gap between turns, which is what the caller actually perceives.
- What contributes most to turn-taking latency?
- End-of-turn detection and any tool calls on the critical path. Speech-to-text and text-to-speech are usually small contributors when streaming.
Used in
Related terms
- Voice AI latency— Voice AI latency is the gap before the system starts talking back.
- Barge-in— Barge-in lets the caller interrupt the bot without breaking the conversation.
- Voice AI— Voice AI is software that answers the phone, understands what the caller wants, and takes action — not just a smarter IVR.
Newsletter
Liked this? Get the next edition.
Plus the Voice AI Readiness Diagnostic in the welcome email.
Welcome email includes the Voice AI Readiness Diagnostic. No second list, no extra form.