Interesting, I’m starting to think undocumented thresholds are quite common in GCP.
I experienced something similar with Clod Run: inexplicable scaling events based on CPU utilization and concurrent requests (the two metrics that regulate scaling according to their docs).
After a lot of back and forth with their (premium) support it turns out there are additional criteria, smthg related to request duration, but of course nobody was able to explain in details.
Yes, we have also experienced undocumented limits for Cloud Run. For us it was an obscure quota for max network packages per second per instance. Really infuriating and took 6 months to track down what it was. I think it has been documented here now: https://cloud.google.com/run/quotas#cloud_run_bandwidth_limi...
sorry but the blame here was 100% on Mozilla. No matter which http version, headers should always be treated as case-insensitive. Blanking anything on google here is just stupid. The problem was nih-syndrome and ignored the http spec.
Mozilla are entirely clear that this was their bug.
However, GCP changing the default under their infrastructure without prior warning was still unacceptable.
Operations work should (IMO must) be conducted with the expectation that any major change like that will expose existing bugs in deployed code.
(I've done enough ops work in my life that I'd love to say 'will potentially expose' but in practice there's always -something- that breaks and if I don't find it in the first 24h after a major change I'm going to spend the next two weeks waiting for the shoe drop to happen)
GCP does send mails when you abo‘d them. GCP is not to blame if they used auto. Heck if your loadbalancer sends you headers lowercase with a new http version it should not result in a bug. GCP‘s change was fine. Their software had a bug that would‘ve led to request smuggling.
I experienced something similar with Clod Run: inexplicable scaling events based on CPU utilization and concurrent requests (the two metrics that regulate scaling according to their docs).
After a lot of back and forth with their (premium) support it turns out there are additional criteria, smthg related to request duration, but of course nobody was able to explain in details.