I think rate limiting is the wrong idea. Say for example, a client wants to re-f...

Animats · on April 12, 2017

This is why I've suggested, and used, fair queuing for rate limiting. Any source can make lots of requests, but they queue up by IP address. (In practice, you have queues organized by hashed IP address, which you service round-robin. That's how routers do fair queuing.) If one queue gets very long, then you start sending 429 errors for new adds to that queue. Otherwise, the requests from the heavy load source just take a little longer.

Unlike most rate limiters, this requires little or no tuning. Each source IP address gets an equal amount of service. Sources which generate heavy load compete with themselves, not others.

The main tuning parameter is how long a queue can get. It's useful to have both length and time limits. But only hostile sources should hit those. They don't change with capacity or number of users. Brief load transients should be handled normally, if slowly.

This won't stop a distributed denial of service attack from many IP addresses. But anything coming from a small number of sources will be handled without much trouble.

antirez · on April 12, 2017

Neat idea, but if you hash the same as the spammer?

Animats · on April 13, 2017

If you're worried about that, randomize the hash key every day or so. The queuing will be suboptimal for about a minute.

hbk1966 · on April 13, 2017

The chances are incredibly slim, and even if it does it's only one user.

jdmichal · on April 12, 2017

Once you start getting into quality-of-service, basic rate limiting like the discussed is not enough. I think Stripe did a better job of covering such concerns in their rate limiter post. They specifically talk about being lenient to "bursty" traffic, as that was a legitimate use-case for clients.

https://stripe.com/blog/rate-limiters

https://news.ycombinator.com/item?id=13997029

peterhunt · on April 12, 2017

Yes, this is why you want to use a token bucket rate limiter (which for some reason was considered and rejected by the original post). We wrote it up here and have an open-source impl on github that's in production serving hundreds of millions of rate limits: https://medium.com/smyte/rate-limiter-df3408325846

kbenson · on April 12, 2017

I particularly liked a comment about a fairly elegant way to rate limit with a caching proxy[1] in that HN post. When your main application returns a rate limit reached request, cache that in your proxy for a certain amount of time, and once it's timed out of the cache, they'll hit your main app again.

1: https://news.ycombinator.com/item?id=13998081

rco8786 · on April 12, 2017

Those concepts are not mutually exclusive and are almost universally used in tandem.

Queues and backpressure to properly serve bursty traffic and rate limiting to protect against bad actors.

philsnow · on April 12, 2017

Where do you queue those requests ? If you do it anywhere under your own control, you will accumulate memory and open connections.

If you issue a 429, the client knows it needs to wait and retry, and you've pushed the backpressure all the way past the demarcation line of your own infrastructure.

daliwali · on April 12, 2017

One could limit concurrent connections per address. The idea is that an OPTIONS request which doesn't take much resources at all could be treated with a different weight than a POST which costs the server time to process.

philsnow · on April 12, 2017

depending on the resources being thrown at you, just trying to limit concurrent connections per address could help (there's the question about how to keep information about how many connections a given address has opened distributed to your load balancing layer and consistent, or at least consistent enough to mostly do the right thing most of the time)

maybe that doesn't help with ipv6, though. you'd run out of memory if you tried to keep track of every /128, and different ISPs hand out different blocks to customers (some give out a /64, maybe some give each customer their own /72, etc).

> The idea is that an OPTIONS request which doesn't take much resources at all could be treated with a different weight than a POST which costs the server time to process.

I'm curious, what frameworks/libraries/whatever have you seen that use HTTP OPTIONS ?

daliwali · on April 12, 2017

More sophisticated ways to reject connections rely on heuristics and may result in denying legitimate requests. I don't have all the answers, I'm just suggesting a way that doesn't consider all requests equal.

Every CORS pre-flight request uses the OPTIONS method. It is also used to advertise which methods are available on a route.

philsnow · on April 12, 2017

> Every CORS pre-flight request uses the OPTIONS method.

https://stackoverflow.com/questions/8685678/cors-how-do-pref...

Neat, thanks :) didn't know that

jon-wood · on April 13, 2017

This depends why you want to rate limit. If its the classic case of stopping one customer consuming all resources because of an overenthusiastic script it's fine. If you're trying to avoid paying for lots of calls to an expensive API, or competitors scraping data, you want to reject the requests entirely rather than just delaying them.

jdwyah · on April 13, 2017

fwiw TokenBuckets are very happy to separate "burst" from "refill rate". ie It's easy to give each client 10 tokens, but then have their bucket refill at 60/minute.

See https://www.ratelim.it/documentation/safety for how to configure a limit with "burst" in RateLim.it

Queueing sounds nice, but the web is synchronous. If somebody crushes you do you really want to queue them and then process the reqs 5 min later? After the connection has already terminated?