This only works if your bucket size is much larger than your number of servers.
In the degenerate case, imagine a rate limit of 2 total requests per minute load balanced across 2 servers with enough traffic that my request is basically hitting a random server. In this case, 50% of the time my second request will be rate limited incorrectly because each server has a bucket of 1 and my second request went to the same server as my first.
I'm sure someone smarter than me (and better at probability) could come up with an equation where you input your rate limit & number of servers and it tells you the probability of a false positive for a user.
Even if you have more servers, you'll still very easily hit the case of rate limiting someone too early. And that's really bad, because it means your clients, who are aware of the rate limit and structure their code to stay under it, will start getting failures they shouldn't get, and they have no way to handle it besides intentionally staying significantly under the advertised rate limit.
So if you're really set on doing something like this, you need to set the actual rate limit to be significantly higher than the advertised rate limit, such that it's extremely unlikely for a client to be rate limited too fast.
In the degenerate case, imagine a rate limit of 2 total requests per minute load balanced across 2 servers with enough traffic that my request is basically hitting a random server. In this case, 50% of the time my second request will be rate limited incorrectly because each server has a bucket of 1 and my second request went to the same server as my first.
I'm sure someone smarter than me (and better at probability) could come up with an equation where you input your rate limit & number of servers and it tells you the probability of a false positive for a user.