Pre-Prod
Staged environment with realistic data (masked for compliance). Full evaluation suite. Adversarial testing. Cost and latency measurement. Only after pre-prod passes do we proceed.
Limited Prod (Canary)
1–5% traffic to the new agent. Monitor error rate, latency, outcomes, user feedback, cost. Compare to baseline. Watch for 1–2 weeks. Only after canary passes do we scale.
Scaled Prod
Gradually expand to 25%, 50%, 100%. Continue monitoring with tighter thresholds as traffic grows. Revert paths always available. A broken scaled prod deployment shouldn’t require a support incident.
Post-Deployment
Continuous monitoring, not ‘deploy and forget.’ Agents drift. Model versions change. User behavior shifts. Regression suite runs continuously; outcomes tracked. Phase of permanent vigilance.