This is how I feel about the 100 msg/wk limit on o3 for the ChatGPT plus plan. There’s no way to see how much I’ve used, and it’s an important enough resource that my lizard brain wants to hoard it. The result is that I way underutilize my plan and go for one of the o4-mini models instead. I would much prefer a lower daily limit, but maybe the underutilization is the point of the weekly limit.
You can tell how it’s intentional with both OpenAI and Anthropic by how they’re intentionally made opaque. I cant see a nice little bar with how much I’ve used versus have left on the given rate limits so it’s pressuring users to hoard. Because it prevents them from budgeting it out and saying “okay I’ve used 1/3 of my quota and it’s Wednesday, I can use more faster.”
FWIW neither hoard nor ration imply anything about permanence of the thing to me. Whether you were rationed bread or you hoarded bread, the bread isn't going to be usable forever. At the same time whether you were rationed sugar or hoarded sugar, the sugar isn't going to expire (with good storage).
Rationed/hoarded do imply, to me, something different about how the quantity came to be though. Rationed being given or setting aside a fixed amount, hoarded being that you stockpiled/amassed it. Saying "you hoarded your rations" (whether they will expire) does feel more on the money than "you ration your rations" from that perspective.
I hope this doesn't come off too "well aktually", I've just been thinking about how I still realize different meanings/origins of common words later in life and the odd things that trigger me to think about it differently for the first time. A recent one for me was that "whoever" has the (fairly obvious) etymology of who+ever https://www.etymonline.com/word/whoever vs something like balloon, which has a comparatively more complex history https://www.etymonline.com/word/balloon
For me, the difference between ration and hoard is the uhh…rationality of the plan.
Rationing suggests a deliberate, calculated plan: we’ll eat this much at these particular times so our food lasts that long. Hoard seems more ad hoc and fear-driven: better keep yet another beat-up VGA cable, just in case.
Hoarding doesn't really imply how you got it, just that you stockpile once you do. I think you're bang on rationing - it's about assigning the fixed amount. The LLM provider does the rationing, the LLM user hoards their rations.
One could theoretically ration their rations out further... but that would require knowing the usage to the point to set the remaining fixed amounts - which is precisely whT's missing in the interface.
Rationing is precisely what we want to do: I have x usage this week; let me determine precisely how much I can use without going over. Hoarding implies a less reasoned path of “I never know when I might run out so I must use as little as possible, save as much as I can.” One can hoard gasoline but it still expires past a point.
Anthropic also does this because they will dynamically change the limits to manage load. Tools like ccusage show you how much you've used and I can tell sometimes that I get limited with significantly lower usage than I would usually get limited for.
Which is a huge problem, because you literally have no idea what you're paying for.
One day a few of hours of prompting is fine, another you'll hit your weekly limit and you're out for seven days.
While still paying your subscription.
I can't think of any other product or service which operates on this basis - where you're charged a set fee, but the access you get varies from hour to hour entirely at the provider's whim. And if you hit a limit which is a moving target you can't even check you're locked out of the service.
One thing that has worked for me when I have a long list of requirements / standards I want an LLM agent to stick to while executing a series of 5 instructions is to add extra steps at the end of the instructions like "6. check if any of the code standards are not met - if not, fix them and return to step 5" / "7. verify that no forbidden patterns from <list of things like no-op unit tests, n+1 query patterns, etc> exist in added code - if you find any, fix them and return to step 5" etc.
Often they're better at recognizing failures to stick to the rules and fixing the problems than they are at consistently following the rules in a single shot.
This does mean that often having an LLM agents so a thing works but is slower than just doing it myself. Still, I can sometimes kick off a workflow before joining a meeting, so maybe the hours I've spent playing with these tools will eventually pay for themselves in improved future productivity.
There are things it’s great at and things it deceives you with. In many things I needed it to check something for me I knew was a problem, o3 kept insisting it were possible due to reasons a,b,c, and thankfully gave me links. I knew it used to be a problem so surprised I followed the links only to read black on white it still wasn’t. So I explained to o3 that it’s wrong. Two messages later we were back at square one. One week later it didn’t update its knowledge. Months later it’s still the same.
But at things I have no idea about like medicine it feels very convincing. Am I in hazard?
People don’t understand Dunning-Kruger. People are prone to biases and fallacies. Likely all LLMs are inept at objectivity.
My instructions to LLMs are always strictness, no false claims, Bayesian likelihoods on every claim. Some modes ignore the instructions voluntarily, while others stick strictly to them. In the end it doesn’t matter when they insist on 99% confidence on refuted fantasies.
The problem is that all current mainstream LLMs are autoregressive decoder-only, mostly but not exclusively transformers.
Their math can't apply modifiers like "this example/attempt there is wrong due to X,Y,Z" to anything that came before the modifier clause in the prompt.
Despite how enticing these models are to train, these limitations are inherent. (For this specific situation people recommend going back to just before the wrong output and editing the message to reflect this understanding, as the confidently wrong output with no advisory/correcting pre-clause will "pollute the context": the model will look at the context for some aspects coded into high(-er)-layer token embeddings, inherently can't include the correct/wrong aspect because we couldn't apply the "wrong"/correction to the confidently-wrong tokens, thus retrieves the confidently-wrong tokens, and subsequently spews even more BS.
Similar to how telling a GPT2/GPT3 model it's an expert on $topic made it actually be better on said topic, this affirmation of that the model made an error will prime the model to behave in a way that it gets yelled at again... sadly.)
If you’re on HN you’ve probably been around enough to know it’s never that simple. You implement the counter, now customer service needs to be able to provide documentation, users want to argue, async systems take hours to update, users complain about that, you move the batch accounting job to sync, queries that fail still end up counting, and on and on.
They should have an indicator, for sure. But I at least have been around the block enough to know that declaring “it would be easy” for someone else’s business and tech stack is usually naive.
> it’s an important enough resource that my lizard brain wants to hoard it.
I have zero doubt that this is working exactly as intended. We will keep all our users at 80% of what we sold them by keeping them anxious about how close they are to the limit.
I don't think this counts as a "dark pattern". The reality is that these services are resource constrained, so they are trying to build in resource limits that are as fair as possible and prevent people from gaming the system.
The dark pattern isn't the payment pattern, that's fine. The dark pattern is hiding how much you're using, thereby tricking the human lizard brain into irrationally fearing they are running out.
The human brain is stupid and remarkably exploitable. Just a teensy little bit of information hiding can illicit strange and self-destructive behavior from people.
You aren't cut off until you're cut off, then it's over completely. That's scary, because there's no recourse. So people are going to try to avoid that as much as possible. Since they don't know how much they're using, they're naturally going to err on the side of caution - paying for more than they need.
I'm only on the $20 Pro plan, and I'm a big users of the /clear command. I don't really use Claude Code that much either, so the $20 plan is perfect for me. However, a few times I've gotten the "approaching context being full, auto compact coming soon" thing, so I manually do /compact and I run out of the 5hr usage window while compacting the context. It's extremely infuriating because if I could have a view into how close I was to being rate limited in the 5 hour window, I might make a different choice as to compact or finish the last little thing I was working on.
If I sit down for dinner at an all-you-can-eat buffet, I get to decide how much I’m having for dinner. I don’t mind if they don’t let me take leftovers, as it is already understood that they mean as much as I can eat in one sitting.
If they don’t want folks to take advantage of an advertised offer, then they should change their sales pitch. It’s explicitly not gaming any system to use what you’re paying for in full. That’s your right and privilege as that’s the bill of goods you bought and were sold.
I feel like using Claude Code overnight while you sleep or sharing your account with someone else is equivalent to taking home leftovers from an all-you-can-eat buffet.
I also find it hard to believe 5% of customers are doing that, though.
If that’s off-peak time, I’d argue the adjacent opposite point, that Anthropic et al could implement deferred and/or scheduled jobs natively so that folks can do what they’re going to do anyway in a way that comports with reasonable load management that all vendors must do.
For example, I don’t mind that Netflix pauses playback after playing continuously for a few episodes of a show, because the options they present me with acknowledge different use cases. The options are: stop playing, play now and ask me again later, and play now and don’t ask me again. These options are kind to the user because they don’t disable the power user option.
Is there really an off peak time, though? I think Anthropic is running on AWS with the big investment from Amazon, right? I'm sure there's some peaks and valleys but with the Americas, Europe and Asia being in different time zones I'd expect there'd be a somewhat "baseline" usage with peaks where the timezones overlap (European afternoons and American mornings, for example). I know in my case I get the most 503 overloaded errors in the European afternoon.
I use Claude Code with Opus four days a week for about 5 hours a day. I've only once hit the limit. Yet the tool others mentioned here (ccusage) indicates I used about $120 in API equivalents per day or about $1,800 to date this month on a $200 subscription. That has to be a loss leader for Anthropic that they now want to wind back.
I also wouldn't consider my usage extreme. I never use more than one instance, don't run overnight, etc.
I think this is just a bad analogy. I've definitely set Claude Code on a task and then wandered off to do something else, and come back an hour or so later to see if it's done. If I'd chosen to take a nap, would you say I'm "gaming the system"? That's silly. I'm using an LLM agent to free up my own time; it's up to me to decide what I do with that time.
No, this doesn't sound like gaming the system to me. However, if you were using a script to automatically queue up tasks so they can run as soon as your 5-hour-session expires to ensure you're using Claude 24/7, that's a different story. A project like this was posted to HN relatively recently.
As I said, I have trouble believing this constitutes 5% of users, but it constitutes something and yeah, I feel Anthropic is justified in putting a cap on that.
I use Claude Code overnight almost exclusively, it's simply not worth my time during the day. It's just easier to prepare precise instructions, let it run and check the results in the morning. If it goes awry (it usually does), I can modify the instructions and start from scratch, without getting too attached to it.
Yes it is, in the way of "I'm gonna work on X thing that is now much easier thanks to chatGPT" and then never work on it due lack of time or motivation or something else.
I nervously hover over the VSCode Copilot icon, watching the premium requests slowly accumulate. It’s not an enjoyable experience (whether you know how much you've used or not :) )
Noticed that my productive usage of CoPilot dropped like a brick, after they introduced those limits. You feel constantly on the clock, and being forced to constantly change models gets tiresome very fast.
Unless you use "free" GPT 4.1 like MS wants you (not the same as Claude, even with Beast Mode). And how long is that going to be free, because it feels like a design to simply push you to a MS product (MS>OpenAI) instead of third party.
So what happens a year from now? Paid GPT 5.1? With 4.1 being removed? If it was not for the insane prices of actual large mem GPUs and the slowness of large models, i will be using LLMs at home. Right now MS/Antropic/OpenAI are right in that zone where its not too expensive yet to go full local LLM.
That's definitely not correct because I'm on the pro plan and make extensive use of o3-pro for coding. I've sent 100 messages in a single day with no limitation.
*edited to change “pro” to “plus”