Anthropic quietly shortened prompt cache TTL and it cost some users 17% more

On March 6, Anthropic changed the default prompt cache TTL for Claude API requests from one hour to five minutes. They did not announce this. Developers only found out through detailed analysis of session logs. The person who filed the GitHub issue analysed 119,866 API calls across January to April 2026 and calculated a $949 overpayment over the period, representing a 17.1% cost increase relative to what the same usage would have cost under the one-hour TTL that had been in place for the preceding 33 days.

Prompt caching works by writing your context once at a higher token rate, then re-using it at a much lower rate for the duration of the cache TTL. Under a one-hour TTL, a long coding session writes context once and reads it cheaply for an hour. Under a five-minute TTL, any pause longer than five minutes causes cache expiry. On resumption, the full context is re-uploaded at the write rate. Writing is 12.5 times more expensive than reading. For sessions with frequent pauses, which describes most developer workflows, the shift from one-hour to five-minute TTL converts a cheap pattern into an expensive one repeatedly throughout the day.

Anthropic’s official response, from engineer Jarred Sumner, said the change was intentional: the new approach uses mixed TTL tiers where different request types get different TTL settings, and the claim is that this is cheaper in aggregate across the full request mix. The logic is that one-hour cache writes cost about twice the base input rate while five-minute writes cost about 1.25 times the base rate, so for one-shot requests that do not benefit from extended caching, the shorter TTL is genuinely cheaper. The aggregate is lower cost overall. The individual user running long sessions pays more. Anthropic disclosed that a separate bug in Claude Code v2.1.90 was also causing some sessions to remain on five-minute TTL after quota exhaustion, which they have since fixed.

The complaint here is not that Anthropic chose the wrong tradeoff. Aggregate cost optimisation across a diverse user base is reasonable product management. The complaint is that a pricing-relevant infrastructure change happened silently. Developers who had observed their February costs and built usage budgets around them were surprised in March. Subscription users on Pro and Max plans reported hitting their five-hour quota limits for the first time. Cache creation tokens count at full rate toward quota; cache read tokens do not. The shift from reads to writes hit quota harder than developers expected.

The response is to treat prompt caching as a best-effort mechanism rather than a guaranteed cost floor, front-load critical context in CLAUDE.md files to get maximum value from each cache write, and monitor session patterns if cost predictability matters to your workflow. More broadly, if you are building production systems on top of any LLM provider’s API, the cache TTL is an infrastructure detail that can change without notice and materially affect your cost model. Budget with that in mind.