Versioning
Prompts are code. Version them in Git. Model pins explicit (not ‘latest’). Fine-tunes versioned. Deployment rollbacks possible. This discipline prevents ‘why did the agent start failing yesterday’ post-mortems.
Prompt Management
Prompt management platforms (Langfuse, Portkey, custom) catalog prompts, version them, enable A/B testing, track performance. Beats copy-pasting prompts in code repos.
Evaluation Pipelines
Nightly eval runs against regression set. Alert on regression. Model upgrade evaluation before production. Evaluation infrastructure is ongoing investment, not a one-time build.
Incident Response
On-call rotation for agent issues. Playbook for common failures — cost spike, hallucination cluster, retrieval failure. Post-incident review with root cause and action items. Professionalize or suffer.