This is the real problem. Even if you don't run anything in AWS directly, someth...

dexterdog · 2025-10-20T22:32:18 1760999538

This becomes the reason to run in us-east-1 if you're going to be single region. When it's down nobody is surprised that your service is affected. If you're all-in on some other region and it goes down you look like you don't know what you're doing.

kelseydh · 2025-10-20T17:49:33 1760982573

This whole incident has been pretty uneventful down in Australia where everything AWS is on ap-southeast-2.

parliament32 · 2025-10-20T16:49:56 1760978996

> Even if you don't run anything in AWS directly, something you integrate with will.

Why would a third-party be in your product's critical path? It's like the old business school thing about "don't build your business on the back of another"

caymanjim · 2025-10-20T18:00:59 1760983259

It's easy to say this, but in the real world, most of the critical path is heavily-dependent on third party integrations. User auth, storage, logging, etc. Even if you're somewhat-resilient against failures (i.e. you can live without logging and your app doesn't hard fail), it's still potentially going to cripple your service. And even if your entire app is resilient and doesn't fail, there are still bound to be tons of integrations that will limit functionality, or make the app appear broken in some way to users.

The reason third-party things are in the critical path is because most of the time, they are still more reliable than self-hosting everything; because they're cheaper than anything you can engineer in-house; because no app is an island.

It's been decades since I worked on something that was completely isolated from external integrations. We do the best we can with redundancy, fault tolerance, auto-recovery, and balance that with cost and engineering time.

If you think this is bad, take a look at the uptime of complicated systems that are 100% self-hosted. Without a Fortune 500 level IT staff, you can't beat AWS's uptime.

fauigerzigerk · 2025-10-20T18:40:45 1760985645

Clearly these are non-trivial trade-offs, but I think using third parties is not an either or question. Depending on the app and the type of third-party service, you may be able to make design choices that allow your systems to survive a third-party outage for a while.

E.g., a hospital could keep recent patient data on-site and sync it up with the central cloud service as and when that service becomes available. Not all systems need to be linked in real time. Sometimes it makes sense to create buffers.

But the downside is that syncing things asynchronously creates complexity that itself can be the cause of outages or worse data corruption.

I guess it's a decision that can only be made on a case by case basis.

jen20 · 2025-10-20T21:18:51 1760995131

With the exception of Amazon, anyone in this situation already has a third-party product in their critical path - AWS itself.

chasd00 · 2025-10-20T19:33:13 1760988793

> Why would a third-party be in your product's critical path?

i bet only 1-2% of AI startups are running their own models and the rest are just bouncing off OpenAI, Azure, or some other API.

thinkindie · 2025-10-20T17:51:57 1760982717

Not necessarily our critical path but today circleci was affected greatly which also affected our capacity to deploy. Luckily it was a Monday morning therefore we didn’t even have to deploy an hot fix.

pcdevils · 2025-10-20T18:03:09 1760983389

That's nearly every ai start-up done for

macintux · 2025-10-20T17:17:22 1760980642

No man is an island, entire of itself

unethical_ban · 2025-10-20T22:09:18 1760998158

* IAM / Okta * Cloud VPN services * Cloud Office (GSuite, Office365)

Good luck naming a large company, bank, even utility that doesn't have some kind of dependency like this somewhere, even if they have mostly on-prem services.

mlavrent · 2025-10-21T02:29:48 1761013788

The only ones I can really think of are the cloud providers themselves- I was at Microsoft, and absolutely everything was in-house (often to our detriment).

parliament32 · 2025-10-21T15:18:36 1761059916

I think you missed the "critical path" part. Why would your product stop functioning if your admins can't log in with IAM / VPN in, do you really need hands-on maintenance constantly? Why would your product stop functioning if Office is down, are you managing your ops in Excel or something?

"Some kind of dependency" is fine and unavoidable, but well-architected systems don't have hard downtime just because someone somewhere you have no control over fucked up.

unethical_ban · 2025-10-21T20:05:55 1761077155

Since 2020 for some reason lot of companies have fully remote workforce. If the VPN or auth goes down and workers can't login, that's a problem. Think banks, call center work, customer service.