OpenCode cost optimization that pissed Anthropic

How OpenCode handles context “A beautiful mess”

Prompt Caching

Model prompts often contain repetitive content, like system prompts and common instructions. Usually requrests are routed to servers that recently processed the same prompts, making it cheaper and faster than processing the entire prompt from scratch.

“Reduce latency and cost with prompt caching… Cache Routing: Usually requests are routed to servers that recently processed the same prompts, making it cheaper and faster than processing the entire prompt from scratch.”

- OpenAI Prompt Caching Documentation

Now Opencode has something

// Called after EVERY user message, not just on context overflow
SessionCompaction.prune({ sessionID })

// Silently erases tool outputs beyond last 40k tokens
export const PRUNE_PROTECT = 40_000

Which actively trims the conversation history so the model only keeps what still matters. This is done for cost optimization.

🧠 Simple mental model

Without pruning, your prompt looks like this over time:

[system prompt]
[user message 1]
[assistant response 1]
[tool output: HUGE]
[user message 2]
[assistant response 2]
[tool output: HUGE]
[user message 3]
...

OpenCode’s SessionCompaction.prune() keeps the prompt clean by removing old tool outputs, but this breaks Anthropic’s prompt caching mechanism.

Why This Matters

  • Prompt caching needs consistent prompts to work efficiently
  • OpenCode’s pruning changes the prompt structure after every message
  • Result: No cache hits = higher costs and slower responses

The cost optimization that saves OpenCode money actually increases costs for users on Anthropic’s platform as it prevents the model from caching repeated content.

Further Reading



Back to AI Bytes