CloudFlare is a good company and everyone has outages. IMHO the post-mortems they post are not only some of the best I've read from a big company, but they are produced quickly.
I only wish they could update cloudflarestatus.com more quickly. Shouldn't there be some mechanism to update that immediately when there is an incident? When the entire internet knows your down and your status page says All Systems GO! it looks very poorly on them.
> Because of the way we securely connect to StatusPage.io from most locations where our team is based. The traffic got blackholed in ATL, keeping us from updating it.
Yep, I was joking with our rep that them using Cloudflare Access for their internal services sounds like a problem waiting to happen.
Guess I wasn't wrong, they might even have lost access to internal monitoring systems which is pretty unfortunate in such a situation. If you ask them about Cloudflare Access they will happily tell you that it was built for internal tool access and that they use it for everything, later they went on to sell it as a product.
When Google Cloud went down a few years ago, they were unable to access internal monitoring because the bad bgp change overloaded their networks if i remember correctly.
While they could dogfood the product, status monitoring systems should be separated from your bread-and-butter product's failures. If you are in the business of messing with BGP, then the BGP that controls the routes that let you report outages should not be the same one you are messing with regularly, or at least, there should be redundancy.
I only wish they could update cloudflarestatus.com more quickly. Shouldn't there be some mechanism to update that immediately when there is an incident? When the entire internet knows your down and your status page says All Systems GO! it looks very poorly on them.