How It Works
Providers (Anthropic, OpenAI) cache the static prefix of prompts. Subsequent requests with the same prefix hit cache, dropping input cost 50–90% and latency 2–4×. Structure matters: prefix must be identical byte-for-byte.
Prefix Structure
Put stable content first: persona, tool definitions, policy statements, unchanging context. Put variable content last: user turn, retrieved passages. The more stable content at the front, the more caching pays off.
Cache TTL
Anthropic caches for 5 minutes by default; extended TTL options exist on higher tiers. Low-traffic agents (fewer than one call per 5 min) see little benefit. High-traffic agents dominate their own cache.
CRM Implications
Customer-service agents with long policy prompts benefit disproportionately. Orchestrator agents with consistent tool definitions benefit. Measure cache hit rate — if it’s below 80%, restructure your prompt.