Versioning

Prompts are code. Version them in Git. Model pins explicit (not ‘latest’). Fine-tunes versioned. Deployment rollbacks possible. This discipline prevents ‘why did the agent start failing yesterday’ post-mortems.

Prompt Management

Prompt management platforms (Langfuse, Portkey, custom) catalog prompts, version them, enable A/B testing, track performance. Beats copy-pasting prompts in code repos.

Evaluation Pipelines

Nightly eval runs against regression set. Alert on regression. Model upgrade evaluation before production. Evaluation infrastructure is ongoing investment, not a one-time build.

Incident Response

On-call rotation for agent issues. Playbook for common failures — cost spike, hallucination cluster, retrieval failure. Post-incident review with root cause and action items. Professionalize or suffer.

Share