Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We noticed this two weeks ago where we found some of our requests are unexpected took more tokens than measured by count_tokens call. At the end they were Anthropic's A/B testing routing some Opus 4.6 calls to Opus 4.7.

https://matrix.dev/blog-2026-04-16.html (We were talking to Opus 4.7 twelve days ago)



Wonder what they do for their token cache if they swap mid-session like that.


Exactly! In https://matrix.dev/blog-2026-04-04-2.html#questions-this-rai..., we raised exactly the same concerns. In particular, we actually saw that a hot swap caused a 100% cache miss. If it's a session filled with 800k tokens, rebuilding the cache is very expensive.

Also looking back at their claim: "Token counts may include tokens added automatically by Anthropic for system optimizations. You are not billed for system-added tokens. Billing reflects only your content."

A/B testing sounds a bit different. Do they really count it as "system-added tokens" and not charge for this extra cost? If you consider the model you're requesting as the baseline, then yes. But technically it's an A/B test of a different model, so they might secretly charge 130% as "we didn't add any system prompt, we just routed you to a better model."




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: