Why baseline before tuning?

Because the only honest comparison is against the methodology the pre-AI measurement used. Tuning first means every post-launch number is measured against whatever produces the most flattering comparison.

Two changes per week feels slow. Why cap it?

Because bundled changes are unattributable. Three or more changes in a week and you cannot tell which one moved the metric, which means you cannot roll back the dead ones.

Should the vendor own the cadence?

No. The contact-centre operations lead chairs the weekly review. Vendor attends, contributes, does not chair. The cadence is the operating model; the operating model belongs to the customer.

Operating model

Voice AI first 90 days: a week-by-week post-launch operating plan

Heads of Ops
CX directors
VP / COO

By Lewis CrookPublished June 15, 2026

Bottom line up front

Most voice AI deployments do not fail at launch; they fail in the operating model that congeals around them in the first 90 days. This is the week-by-week plan to install instead.

Weeks 1–2 — stand up the cadence and the observability

Before any tuning, install the cadence. One ninety-minute weekly meeting, the same three roles, a written decision log. Confirm the conversation owner can see per-call transcript, intent labels, tool calls, latency per step, and escalation reason without engineering involvement. If they cannot, that is your week-one bug, not a week-twelve concern.

Week 1: weekly cadence in calendars; observability access verified end-to-end; success contract circulated for signature.
Week 2: success contract signed by the four people who can kill the programme (operations, finance, transformation, compliance); decision log template live.

Weeks 3–4 — baseline before you tune

Publish the baseline before changing anything. Containment, re-contact within 7 days, CSAT, cost per resolved call, latency p95. Use the exact methodology the pre-AI measurement used so the comparison is honest. Skip this and the post-launch metric will be measured against whatever produces the most flattering number.

Weeks 5–8 — small attributable changes

Ship the first two intent or guardrail changes per week, each with a metric they are meant to move and a control window. Resist the vendor pressure to bundle changes — bundles are unattributable. Resist the executive pressure to ship the big intent expansion — that comes in month four.

Two changes per week, no more — attributability matters more than throughput
Each change names the metric it is meant to move and the window over which it will be measured
Roll back any change that does not move the metric within two weeks; do not let dead changes accumulate

Weeks 9–12 — extension and the first board pack

Extend the intent backlog by no more than two items. Catalogue failure modes that emerged at month-two scale but not month-one. Write the first quarterly board pack: one page, five lines, no dashboard exports.

Primary metric vs target, with one sentence on any miss
Top three failure modes this quarter, with the change shipped against each
Top three failure modes carried, with why they remain open
What shipped, what is shipping next quarter
Single forward risk most likely to compromise the next quarter

What good looks like at day 90

Day 90 — green / yellow / red

Signal	Green	Yellow	Red
Containment vs baseline	+10pp or better	+3–9pp	Below or at baseline
Re-contact within 7 days	Down vs baseline	Flat	Up vs baseline
Cost per resolved call	Tracking to business case	Within 20%	Over 20% adverse
Operating cadence	Decision log every week	Missed 1–2 weeks	No log, attendance only
Change attribution	>70% of changes traceable to a metric	40–70%	<40%
Failure-mode catalogue	Maintained, top-3 named	Sporadic	None

Do this on Monday

Put the weekly review on the calendar with the same three roles, recurring for the next 12 weeks. Write the success contract in a single page and circulate for signature today.

Key takeaways

Install the cadence in week one; install observability in week two; baseline before tuning anything.
Two attributable changes per week beats five bundled changes — roll back any change that does not move its metric within two weeks.
Extend the intent backlog by no more than two items in the first 90 days.
Day 90 board pack is one page, five lines — not a dashboard export.
Operations leads the weekly review; vendor attends.

Frequently asked questions

Why baseline before tuning?: Because the only honest comparison is against the methodology the pre-AI measurement used. Tuning first means every post-launch number is measured against whatever produces the most flattering comparison.
Two changes per week feels slow. Why cap it?: Because bundled changes are unattributable. Three or more changes in a week and you cannot tell which one moved the metric, which means you cannot roll back the dead ones.
Should the vendor own the cadence?: No. The contact-centre operations lead chairs the weekly review. Vendor attends, contributes, does not chair. The cadence is the operating model; the operating model belongs to the customer.

Terms used in this guide

Voice AI— Voice AI is software that answers the phone, understands what the caller wants, and takes action — not just a smarter IVR.
Containment rate— Containment rate is the percentage of calls the automation finished on its own.

Last reviewed: 2026-06-15. This guide is updated when production patterns shift; see the corrections page to flag anything that no longer matches reality.

About the author

Lewis Crook

Practitioner writer on enterprise voice AI

Lewis Crook — 20 years in enterprise technology, from FTSE 100 voice deployments to over a million AI-handled minutes a month across Asia-Pacific. Buyer, builder, and now working with CX leaders on enterprise voice AI. Writes The Voice AI Brief. Connect on LinkedIn. More about Lewis.

Newsletter

Liked this? Get the next edition.

Plus the Voice AI Readiness Diagnostic in the welcome email.

Welcome email includes the Voice AI Readiness Diagnostic. No second list, no extra form.