On modern systems you have about 28k ephemeral ports available. 65,535 is the total number of ports (good luck trying to use them all). Either way, if you have more than 20k connections open to a single backend (remember linux does connection tracking using the 4 tuple, so you can reuse a source port to different destinations) you are doing something seriously wrong and should hire competent network engineering folks.
> Either way, if you have more than 20k connections open to backends you are doing something seriously wrong
I don't see how that is a fringe or rare case. With a loadbalancer (using no pipelining or multiplexing), the number of simultaenous outgoing http connections to backend systems is at least the number of simulatenous open incoming http connections. Having more than 28k simultanous incoming http requests is not a lot for a busy load balancer.
Now with pipelining (or limiting to 28k outgoing connections), the loadbalancer has to queue requests and multiplex them to the backends when connections become available. Pipelining suffers from head-of-line blocking, increasing possible latency caused by the loadbalancer further. In any case, you will increase latency to the end-user by queing. If you use HTTP/2 multiplexing, you can go past those 28k incoming connections without queing on the loadbalancer side.
> the number of simultaenous outgoing http connections to backend systems is at least the number of simulatenous open incoming http connections
No it isn't. You establish a pool of long lived connections per backend. The load balancer should be coalescing in flight requests. At that traffic volume you should also be doing basic in-memory caches to sink things like favicon requests.
I am not going to respond further as this chain is getting quite off topic. There are plenty of good resources available from relevant Google searches, but if you really still have questions about how load balancers work my email is in my profile.
> You establish a pool of long lived connections per backend
Yes, and you would do the same with HTTP/2. You haven't addressed the head-of-line blocking problem caused by HTTP/1.1 pipelining, which HTTP/2 completely solves. Head-of-line blocking becomes an increasing issue when your HTTP connections are long lived, such as when using websockets or large-media transfers or streaming.
It's amazing how people having visibly never dealt with high loads can instantly become vehement against those reporting a real issue.
The case where ports are quickly exhausted is with long connections, typically WebSocket. And with properly tuned servers, reaching the 64k ports limit per server comes very quickly. I've seen several times the case where admins had to add multiple IP addresses to their servers just to hack around the limit, declaring each of them in the LB as if they were distinct servers. Also, even if Linux is now smart enough to try to pick a random port that's valid for your tuple, once your ports are exhausted, the connect() system call can cost quite a lot because it performs multiple tries until finding one that works. That's precisely what IP_BIND_ADDRESS_NO_PORT improves, by letting the port being chosen at the last moment.
H2 allows to work around all this more elegantly by simply multiplexing multiple client streams into a single connection. And that's very welcome with WebSocket since usually each stream has little traffic. The network also sees much less packets since you can merge many small messages into a single packet. So there are cases where it's better.
Another often overlooked point is that cancelling a download over H1 means breaking the connection. Over H2 you keep the connection opened since you simply send an RST_STREAM frame for that stream in the connection. The difference is important on the frontend when clients abort downloads multiple times per browsing session (you save the TLS setup again), but it can also make a difference on the backend, because quite often an aborted transfer on the front will also abort an H1 connection on the back, and then that's much less fun for your backend servers.
> It's amazing how people having visibly never dealt with high loads
I've built multiple systems at 1M+ r/s and Tb+ scale.
> The case where ports are quickly exhausted is with long connections, typically WebSocket
Yes, HTTP2 is great for websockets. I was never advocating against it. The comment I was replying to was under the false assumption that you needed an outbound backend connection for every incoming connection. All of his concerns are solved problems in any modern open source load balancer. See https://www.haproxy.com/blog/http-keep-alive-pipelining-mult... ;)
But it's the same for other long sessions such as slow downloads and git clones. Sites concerned by the number of source ports are not those dealing with just favicon.ico and bullet.png, but mainly those dealing with long transfers.
Also there's a cascade effect on large sites, where as long as your servers respond fast, everything's OK. Suddenly a database experiences a hiccup, everything saturates, and once you enter the situation where the LB has all of its ports in use, it can take a while to recover because of connect() getting much slower (I already observed delays up to 50ms!). At this point there's no hope to recover in a sane time, because excess connections are not even served by the servers, they're in the accept queue in the system, so they keep a port busy, slowing down connect() which means more even connections are needed for other incoming requests. If the LB is not properly sized and tuned, you'd rather just kill it to get rid of all the connections at once, wait a second or two for the RST storm to calm down and start again.
H2 can avoid that, at the expense of other issues I mentioned in another response above (i.e. don't multiplex too much to the servers, 5-10 streams max, to avoid the risk of inter-client HoL). But H2 also comes with higher xfer costs than H1 for large objects due to framing.