Why GitHub is down so often? Why is it not possible to keep it up 100% of time (...

blue_cookeh · on March 27, 2023

I truly doubt you are running at the scale of GitHub both in terms of users, complexity, and amount of data.

Aeolun · on March 27, 2023

It’s probably more a thing of dev velocity. If you aren’t changing anything it’s easy to keep your system operational.

raxor53 · on March 27, 2023

I'm sure your scale is similar.

substation13 · on March 27, 2023

Google search is basically never down.

AWS is basically never down.

WhatsApp is basically never down.

Time for GitHub to grow up?

capableweb · on March 27, 2023

> Google homepage is basically never down.

The complexity difference between the Google search "app" (not counting the vast indexing infrastructure) and GitHub is also vastly different.

> AWS is basically never down.

Lol what? Have you used AWS?

> WhatsApp is basically never down.

Makes sense, Whatsapp always had a huge focus on reliable infrastructure, since day 0. Pays off I guess :)

substation13 · on March 27, 2023

I think you are nitpicking. My point is that companies (including Microsoft!) are capable of running large scale infra with much higher uptimes than GitHub. They want to put themselves at the center of our workflows (e.g. GitHub Actions) yet they are not delivering uptimes that are commensurate with that. What is their excuse?

capableweb · on March 27, 2023

Yeah, I agree with you, bit nitpicky. I also agree that they shouldn't have an excuse, besides confessing their engineering standards are not up to the level of their ambition, which is why I never make anything in my infrastructure depend on anything GitHub, everything that I use GitHub for, I have alternatives setup for the inevitable ill-timed downtime I know will happen.

rsstack · on March 27, 2023

Great question: Google's homepage revenue is directly 1:1 matches to its uptime. Its user retention is also loosely tied to its uptime, as the value is mostly a replaceable commodity (is Bing worse? sure, but it has results). This leads to the organization investing huge amounts of time and money in ensuring its uptime. I can recall a single outage in the past several years.

On the other hand, GitHub's revenue is mostly monthly/annual licensing and their have great stickiness as it's not trivial to migrate to an equivalent service provider (excluding minor projects who only use a couple of features). They can increase profits through feature development and cost saving, a lot more than through uptime. Is there a limit to this? Of course.

bhouston · on March 27, 2023

Google loses money when search is down because they can not serve ads. Does Github actually lose money when they are down? I think that because everyone is on subscription they do not lose money by the second, rather instead they lose reputation and long term they could lose customers, but Github's income isn't quite as sensitive to downtime as Googles in general, thus less investment in DevOps in comparison.

sdfhbdf · on March 27, 2023

RO system is generally probably easier to keep up than a RW system that is constantly innovating.

bhouston · on March 27, 2023

I Think he was referring to Google Search in general. I've never witness since Google went live in 1998 any Google Search downtime. Probably happened but I can not remember it.

ZiiS · on March 27, 2023

You don't notice when their indexers cannot write; preforming a search is basically RO.

substation13 · on March 28, 2023

> preforming a search is basically RO

You don't know this. Google results are not the same for all users. How do you know there isn't R/W going on, particularly when signed-in to Google?

(Unless you work at Google on search, in which case I stand corrected!)

ZiiS · on March 29, 2023

I am certain the are normally writes going on; they do run Analytics on their homepage. However they get to defer, retry and play lots of eventually consistent tricks, worst case just swallow the exceptions. The fact they can make the service _seem_ to the end user as fully working whilst being unable to write is a major factor in achieving their world beating reliability.

bhouston · on March 27, 2023

It is still a relatively complex multi-machine RO operation. It isn't like serving a static site.

ZiiS · on March 28, 2023

Sure, but they can have several copies of the index per datacenter, retry your query multiple times posibly even in a diffrent datacenter. New code and even updated indexes can be tried and then fall back to yesterday's version.

sdfhbdf · on March 27, 2023

Google Search is still a RO system - you are mostly just retrieving information from a search index.

bhouston · on March 27, 2023

It is read-online but its read systems are relatively complex and work at scale.

substation13 · on March 27, 2023

How do you know that?

oneeyedpigeon · on March 27, 2023

I don't think GitHub's homepage has gone down at any point during this outage either.

new_user_final · on March 27, 2023

ummm, I guess scale is similar. I am just a single person vs an organization. My google knowledge vs industry expert with years of experience.

My point was not about similar scale though. How hard is it to keep a system up? AWS is a whole universe compared to GitHub, yet it doesn't go down as often as GitHub.

rvz · on March 27, 2023

Only GitHub truly knows. But everyone here knows that since Microsoft acquired it, it has degraded to the point where it goes down every month.

It is so frequent and unreliable, you just might as well self-host at this point. You would likely have better up time than GitHub over the past three years since this prediction. [0]

[0] https://news.ycombinator.com/item?id=22867803

zensayyy · on March 27, 2023

I find this strange too. GH seems to have more major incidents lately...

re-thc · on March 27, 2023

ChatGPT overloading it... (scanning repos)