> ... a simple fail, back off, retry system actually increases overall load due to duplicate requests retrying.
Well, two things.
First, our clients didn't have a "simple" back off, retry system. They tended to have more sophisticated back-off and retry systems that didn't increase pressure on services that were overloaded. There are a number of different techniques you use to accomplish this and you can wrap up several of these techniques in a library, use the library in your client code, and have that client code (a fat client) be the official API for your service.
Second, this also assumes that front-end load is significant to begin with. In our system, it definitely wasn't. Requests were super cheap. You could hammer the front-end extremely hard and it would just respond with "try again later" until there was some capacity available, and then it would let in some work. The size of an actual request was quite small, it was the size of the work represented by that request that was large.
> OTOH, I'm sort of amazed by the people who reach for queues to solve loading issues without first assuring that the system works properly at full load without any queuing. Queue's only serve to smooth out busty behavior and come with their own problems (buffer bloat related latency).
Queues in our system were a necessary part of the design. The system would not operate efficiently without them, because work items needed to be batched in order to be processed efficiently. I don't think there's a way that you can indict a system for using queues unless you know something about the design requirements.
There are absolutely other reasons to use queues other than smoothing out bursty behavior.
Well, two things.
First, our clients didn't have a "simple" back off, retry system. They tended to have more sophisticated back-off and retry systems that didn't increase pressure on services that were overloaded. There are a number of different techniques you use to accomplish this and you can wrap up several of these techniques in a library, use the library in your client code, and have that client code (a fat client) be the official API for your service.
Second, this also assumes that front-end load is significant to begin with. In our system, it definitely wasn't. Requests were super cheap. You could hammer the front-end extremely hard and it would just respond with "try again later" until there was some capacity available, and then it would let in some work. The size of an actual request was quite small, it was the size of the work represented by that request that was large.
> OTOH, I'm sort of amazed by the people who reach for queues to solve loading issues without first assuring that the system works properly at full load without any queuing. Queue's only serve to smooth out busty behavior and come with their own problems (buffer bloat related latency).
Queues in our system were a necessary part of the design. The system would not operate efficiently without them, because work items needed to be batched in order to be processed efficiently. I don't think there's a way that you can indict a system for using queues unless you know something about the design requirements.
There are absolutely other reasons to use queues other than smoothing out bursty behavior.