Latency Budgets
Voice demands sub-second response. Pauses over 2 seconds feel broken. Build streaming response synthesis so speech starts before full answer is computed. Users tolerate short pauses mid-speech better than long lead-in silence.
Prosody Design
Synthesized voices have become excellent. Match prosody to context — empathetic tone for complaint handling, crisp for transactional. Provider APIs expose tone controls. Use them.
Interruption Handling
Users interrupt. Detect user speech while system is speaking; stop, listen, respond. Poor interruption handling feels rude. Most voice AI providers handle this reasonably; verify for your platform.
Error Recovery
When the AI fails to understand, don’t loop endlessly. After 2 failed attempts on same intent, offer escalation. ‘Let me connect you to someone who can help’ prevents frustration and preserves trust.