OpenCode cost optimization that pissed Anthropic
How OpenCode handles context “A beautiful mess”
Prompt Caching
Model prompts often contain repetitive content, like system prompts and common instructions. Usually requrests are routed to servers that recently processed the same prompts, making it cheaper and faster than processing the entire prompt from scratch.
“Reduce latency and cost with prompt caching… Cache Routing: Usually requests are routed to servers that recently processed the same prompts, making it cheaper and faster than processing the entire prompt from scratch.”
Now Opencode has something
// Called after EVERY user message, not just on context overflow
SessionCompaction.prune({ sessionID })
// Silently erases tool outputs beyond last 40k tokens
export const PRUNE_PROTECT = 40_000
Which actively trims the conversation history so the model only keeps what still matters. This is done for cost optimization.
🧠 Simple mental model
Without pruning, your prompt looks like this over time:
[system prompt]
[user message 1]
[assistant response 1]
[tool output: HUGE]
[user message 2]
[assistant response 2]
[tool output: HUGE]
[user message 3]
...
OpenCode’s SessionCompaction.prune() keeps the prompt clean by removing old tool outputs, but this breaks Anthropic’s prompt caching mechanism.
Why This Matters
- Prompt caching needs consistent prompts to work efficiently
- OpenCode’s pruning changes the prompt structure after every message
- Result: No cache hits = higher costs and slower responses
The cost optimization that saves OpenCode money actually increases costs for users on Anthropic’s platform as it prevents the model from caching repeated content.
Further Reading
- OpenCode Compaction Implementation - See the actual code behind the pruning mechanism
OpenCode is silently pruning your conversation history to save costs, but this breaks prompt caching and actually makes things more expensive for users on Anthropic
— TRQ (@trq212) March 21, 2026