Years ago, when I was playing with AWS in a course on building cloud-hosted services, it was well-known that all the AWS management was hosted out of a single zone, and there were several days we had to cancel class because us-east-1 had an outage, so while technically all our VMs hosted out of other AZs were extant, all our attempts to manage our VMs via the web UI or API were timing or erroring out.
I understand this is long-since resolved (I haven't tried building a service on Amazon in a couple years, so this isn't personal experience), but centralized failure modes in decentralized systems can persist longer than you might expect.
(Work for Google, not on Cloud or anything related to this outage that I'm aware of, I have no knowledge other than reading the linked outage page.)
> it was well-known that all the AWS management was hosted out of a single zone, and there were several days we had to cancel class because us-east-1 had an outage
Maybe you mean region, because there is no way that AWS tools were ever hosted out of a single zone (of which there are 4 in us-east-1). In fact, as of a few years ago, the web interface wasn’t even a single tool, so it’s unlikely that there was a global outage for all the tools.
And if this was later than 2012, even more unlikely, since Amazon retail was running on EC2 among other services at that point. Any outage would be for a few hours, at most.
"Some services, such as IAM, do not support Regions; therefore, their endpoints do not include a Region."
There was a partial outage maybe a month and a half ago where our typical AWS Console links didn't work but another region did. My understanding is that if that outage were in us-east-1 then making changes to IAM roles wouldn't have worked.
I understand this is long-since resolved (I haven't tried building a service on Amazon in a couple years, so this isn't personal experience), but centralized failure modes in decentralized systems can persist longer than you might expect.
(Work for Google, not on Cloud or anything related to this outage that I'm aware of, I have no knowledge other than reading the linked outage page.)