Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
GitHub issue - resolved (githubstatus.com)
323 points by mactavish88 on March 27, 2023 | hide | past | favorite | 160 comments


HN should really have a “Down HN” category, as this is the most reliable place to get this information.


Everytime something has degraded performance or is down for me:

1. check my internet connection

2. check HN

3. check official statuspage


Step #3 is unfortunately often pointless given that often either:

1. the status page is unavailable too

2. the status page reports the service as green/available even though it's red/down (maybe it's still accessible to the service pinging its health status, maybe it's "accessible" but not actually functional, maybe the engineers were too busy fixing the problem to click the button to update the status page, maybe not updating the status lets them pretend they're within SLAs or KPIs)


I suppose a static html page saying everything is fine hosted on the same infrastructure as your service would be an accurate indicator a good chunk of the time.


Yeah it gives you a bit of confidence that the problem may not be something on your end. I don't think there's a perfect way to handle this, though.


Most places don't have an automated status page because of the issues with automation showing outages when they don't exist. Someone manually goes and clicks something on the status page.


This doesn't seem like a good reason. I can't imagine anyone checks githubstatus.com before accessing github.com. People check it after they have issues.

I assume it's more due to contracts, marketing, and denying responsibility.


HN is de facto the fastest secondary source for any current event related to tech.


I also use it for connectivity check. Most other sites have so much ads and cookies it creates long lags with adblock in some older systems I use. So I point to HN and get a really fast response that the web is working


One time HN was itself down and that was a confusing experience


I check HN first because that also checks my internet connection :-)

that said, HN is also down a lot lately, but that the kind of outage that makes more more productive actually!


Surely you can skip the first step, as the second combines them.


And if HN is down, you know it's a biggie


> check my internet connection

How do you do that?


I find downdetector to be quite reliable. You can pick up on a spike of reports very quickly.

HN though comes with all the commentary.


Except for ISPs. Whenever a major website goes down, people blame their ISPs. When their own provider goes down, they don't remember which one they use and blame every other one in their region as well.


Hahah yes. It’s useless for tracking my ISP. Luckily my ISP is super nerdy and respond within minutes on their dslreports forum.


it'd be cool if HN would check the status of Github periodically and turn into a "we're partying now" kind of UI when Github is down.


Now's a good opportunity to ask about alternative Git hosts. What other services do HNers use?

I've been unwilling to host any personal projects on GH after Copilot launched and it because clear that GH/MS doesn't really respect the authors of the code they host. Honestly open source in general has gotten a little less compelling to me after Copilot. The recent security issue at GH has also turned me off even more on Git hosting services.

For closed source projects maybe it's just best to store encrypted backups off site and spin up a self hosted option whenever collaboration is needed. Seems pretty inefficient though.


My company uses a self-hosted Gerrit instance. A bit of a learning curve, but the code review experience is SO much better than PRs (one commit == one unit of review, clean intra-diffs between different revisions of a patch, stacking of reviews simply by having multiple commits on a branch, UI is very snappy and responsive...).

Self-hosting Gerrit is easy[1] because all its internal state is fully transactional, including reviews and configs, and is simply stored as normal Git commits inside the Git repos on the filesystem.

In fact, our instance had much better uptime over the past year than GitHub despite being migrated to another server once!

[1]: ... unless you need a complex high availability setup or replicas. But 99% of projects are fine with just a single instance and backups.


Gerrit looks ugly, and the admin story is a nightmare, but it's such a better experience for code reviews (plus you can actually enforce code review in a way Github cannot). I can't stand dealing with Github for code review after spending a few years using Gerrit.


(shameless plug alert)

If you're looking for a drop-in improvement to GitHub's code review experience, you may be interested in CodeApprove (https://codeapprove.com).

It's got a lot of the things that make Gerrit appealing, but with less of a drastic workflow shift (still branch-based PRs) and a much nicer UI.


Haven't tried this one but Reviewable and Graphite. They're all very nice and yours looks nice also. The problem with all these third party SaaS-es on top of GitHub:

- You have to trust a random (no offense meant!) SaaS company with full access to your repositories, and to not disappear in a year or two.

- GitHub API rate limits end up causing issues sooner or later. For instance, Reviewable would randomly break and ask you to add more admin users so it could load balance API requests across multiple accounts!

- Likewise, you are still forced into the PR model and things that are trivial in Gerrit, like stacked diffs, are still hard. spr helps[1], but at that point you are piling workarounds on top of workarounds, might as well use a tool that supports the workflow natively...

- It gets messy unless 100% of the team is using it because then you have to somehow sync comments and approvals back and forth... And getting 100% of the team to use it isn't much easier than convincing them to use Gerrit, with all of the downsides.

[1]: https://github.com/ejoffe/spr


All great points, and I appreciate them because I think CodeApprove's marketing materials should address them more head on.

The first two (trust with repos and GitHub API rate limits) don't really apply to CodeApprove in the same way they do to Reviewable. Because we're using GitHub's newer "apps" system and not OAuth, we only have very limited scopes (can't write code, etc.) and API rate limits go up as we get more users.

Your points about stacked diffs and PR adoption stand!


Gitea[1]. It's a self hosted Github alternative, and it's quite lightweight too.

[1] https://gitea.io


Good API too. With some Powershell magic I added commit checks which get their status from TeamCity (you can also use Gitea actions but that doesn't work on Windows).

Then use Renovate and Google OSV scanner as a replacement for Dependabot and Github Advanced Security.


If you're into self hosting, soft-serve is a really cool CLI based git server from Charmbracelet that is about a half step above a bare git repo over SSH

https://github.com/charmbracelet/soft-serve


That is pretty slick. Not having to deal with file permissions when limited access to collaborators sounds nice. Can't say I haven't messed that up before.


I have a vps with bare git repos, pushing via ssh. A selection of these has hooks set up to mirror all pushes onto github/gitlab/$platform, but these are write-only.


> I have a vps with bare git repos, pushing via ssh. A selection of these has hooks set up to mirror all pushes onto github/gitlab/$platform, but these are write-only.

I like it, self-hosting bare git repos has been pretty painless for me in the past on LANs. You could still add a hook to encrypt the repo and backup whenever you merge to dev or something as well.

You pretty much only lose the hosted diff/review/ticketing tools which I've never enjoy much regardless.


I've been very happy with SourceHut (git.sr.ht). It's fast and slick. Does what you need, none of the cruft. Among other things, I really like: I can push to any URL under my username and it creates a repo automatically, and that it reminds me to put a license file on public repos.

I really like how the author is open about both the development and the business side.


At my previous work almost everything was self-hosted including Git and servers to run machine learning models. The only exception was Jira. The company owned the servers and rented the space in a datacenter. For code reviews we had Critic.

At the new work we use BitBucket, but for reviews its UI is strictly worse than Critic. And on top of it it is strictly more expensive than self-hosting experience including paying for a competent sysadmin.

I understand that cloud allows to offload a lot of headaches, but I really see no point in using cloud services for development. Even for a small company a dedicated server with another to spare in case of failures will be cheaper and it’s administration will be trivial.


I'm migrating from GitHub to CodeCommit. My project has pretty strict security guidelines, and GitHub doesn't have a high enough accreditation. GitHub tries to handwave this by saying good current practice means there's no personally identifiable information, etc. in Git repositories, but I'm not willing to entrust the code behind moderately sensitive infrastructure to a service that thinks they don't need to implement (and more importantly, prove they implement) better security standards and practices.

Recent developments have only reinforced my feelings on this matter.


I’ve been happy with SourceHut


We still use self-hosted GitLab. It's crazy expensive but we're completely locked in to their CI/CD pipelines plus we still enjoy using it ¯\_(ツ)_/¯


Gitlab is very good and feels almost the same.


This was done out of an abundance of caution, I'm sure.


For those who missed the context of this joke: https://news.ycombinator.com/item?id=35285390


Can’t get hacked if you’re offline.


Except for your bitcoins, if your cold wallet had a bug:

https://research.kudelskisecurity.com/2023/03/06/polynonce-a...


I feel like HN could just have a traffic monitor that adds a little icon to the main page. Like "I dunno what's up, but there's a LOT of y'all here right now, so something probably is".


I think you meant "I dunno what's down..."


Even better


Yup. Tried pushing and got "remote: fatal error in commit_refs". Was trying to understand what I was doing wrong.


Same here haha. StackOverflow informed me that it means GitHub is down, and indeed it was.

I have to wonder if Git could somehow report this better. I guess it depends on exactly how GitHub is down, but "fatal error in commit_refs" made me worry that my local repo was somehow hosed.


I think it can? whenever I can't connect to my company's git server I get a `failed to connect to git.company.com` message. or something like that


It reports what it has, if it manages to connect but fetching the metadata or whatever fails, that’s what it’s going to report.

If it can’t even connect it’ll tell you that, but I would assume on github the client will always manage to connect unless their entire network is down.


Same here! Was going crazy what I was doing wrong. Renamed my branch to see if that was causing anything.


Small world. I assumed I had created a duplicate branch name or something, and renamed the branch as well before hitting up google.


For me it went down just as I was logging in, I thought I was banned or something :')


Same, I thought they had finally come for me because I refuse to change master to main.


I was rebasing and thought I had committed some kind of git sin that I wasn't aware of.



Seems better done than a lot of status pages, runs on separate infra, updated, has a way to subscribe, etc.

However, saying "degraded performance" when you know it's "down for everyone" is an industry phrasing thing that's irritating. AWS also has "elevated response times" when everyone is seeing 5xx errors, or infinite response times.


It's not down for everyone. I can browse just fine, but my pushes get rejected, so that qualifies as "degraded performance" for me.


It's got many named granular services marked as "degraded performance" or "degraded availability" that seemed to be down for everyone.


> However, saying "degraded performance" when you know it's "down for everyone" is an industry phrasing thing that's irritating. AWS also has "elevated response times" when everyone is seeing 5xx errors, or infinite response times.

Another popular one is "elevated API error rates" when the error rate is 1.


having been on the other side a number of times at a site with huge amounts of traffic, very often things can be down for a huge percentage but our logs will still show thousands of requests succeeding a minute. so it might be working well but slowly for some while not at all for many


It seems accurate to me - after some time, several "degraded performance" flags have been changed to "major outage".


For what it's worth, it's powered by statuspage.io, which is relatively industry standard for status pages.


Because this needs to happen as one prepares to push to day's last deploy. Back to backups.

Have backups for critical systems, people. In my case, it's building docker containers locally and luckily deploying to one server via ssh.


Because if you don't spend multiple days per year managing your own "backup" to GitHub, you might be left unable to push to GitHub for a few hours per year?

btw, I hope none of your CI system relies on build steps that might include pulling code from GitHub or downloading packages from GitHub Package registry. Often when GitHub is down, my CI system on GitLab is broken too.


I like to reduce the single point of failure count in my projects as much as I can, with common sense. My backup system is running a few cli commands instead of git push, so it's a no brainer. YMMV.


How sure are you that none of those commands has a dependency on GitHub infrastructure?


I understand the point you're making here, but I feel it is being made in an effort to prove hakanderyal technically wrong rather than to evaluate the practicality of single points of failure, which is what they're trying to promote. I this this conversation would be much more helpful and insightful if it was kept on that evaluation track rather than trying to have a final word.


I don't think "GitHub is down" threads are known for the quality of their conversation, but sure.

I also agree with reducing (increasing?) single points of failure. I'm not trying to be pedantic, but rather observing that in practice, it's not nearly as easy as spinning up a backup Git server (which is already hard enough).

Maintaining two classes of build infrastructure throughout all your dependencies is probably not a worthwhile problem to solve, unless you want to control for the improbable risk that GitHub will be down for weeks at a time. You'd be much better off ensuring that you are able to perform rollbacks without needing to pull from the external world, because this way the worst case scenario is you run a stale version for the time that GitHub is down, in the off chance you pushed a broken version right before the outage.


True, but there are other ways to go about it. Rather than trying to challenge someone's setup, which (I assume, apologies if incorrectly) you aren't familiar with, you can start by asking them about the setup and how they keep it independent from GH. Let them expose either the success of the endeavor or its shortcomings. Such an approach is a constructive one, whereas trying to challenge someone is an antagonistic approach. We're all in this together, so let's keep our discussions constructive, and focus on learning from each other rather than try to tear each other down.


Because I use Github only for git hosting and actions (building & deploying containers). In this case, code is already in my computer, and I can build and ssh to server and pull&run the new container.

"docker buildx build ... -push" "ssh ...@..." "docker pull ..." "docker compose up ..."


That assumes that none of your Docker images have a build step that includes interacting with GitHub, or if some do, that you have every affected layer already cached on your local computer.


Is there anywhere a history of the uptime of GitHub?


There is the GitHub Status Page [0], which doesn't display aggregate stats, but you could scrape it and do the analysis.

I suspect what you're getting at is that the downtime might evaluate to multiple days over the course of the year. Maybe that's true, idk. I'd be curious, but you'd probably want to do the analysis separately for different services (e.g. Actions vs. Package Registry vs. Git outages all have different effects on build infrastructure downtime).

[0] https://www.githubstatus.com/history


Depends on the context. I need to push to a repo and QA needs to test the app, but since I can't push, they can't. No other practical way (other than setting up another origin on Bitbucket or GitLab, or emailing/zipping sending etc).


Yea, heavily depends on the context. In my case, it's a single dev deploying to single server, so no biggie.


Git Docs... - https://git-scm.com/book/en/v2/Getting-Started-About-Version...

"...The next major issue that people encounter is that they need to collaborate with developers on other systems. To deal with this problem, Centralized Version Control Systems (CVCSs) were developed. These systems (such as CVS, Subversion, and Perforce) have a single server that contains all the versioned files, and a number of clients that check out files from that central place. For many years, this has been the standard for version control."

"...However, this setup also has some serious downsides. The most obvious is the single point of failure that the centralized server represents. If that server goes down for an hour, then during that hour nobody can collaborate at all or save versioned changes to anything they’re working on. If the hard disk the central database is on becomes corrupted, and proper backups haven’t been kept, you lose absolutely everything..."

"...This is where Distributed Version Control Systems (DVCSs) step in. In a DVCS (such as Git, Mercurial, Bazaar or Darcs), clients don’t just check out the latest snapshot of the files; rather, they fully mirror the repository, including its full history. Thus, if any server dies, and these systems were collaborating via that server, any of the client repositories can be copied back up to the server to restore it. Every clone is really a full backup of all the data...."


Git avoids the problem where the central service being down gets in the way of local development. Even in Git, that central service going down means degraded collaboration.

Normally a Git remote is just an ssh-accessible machine, and so pretty resilient. But GitHub is a lot more complex, so apparently that simple service went down, along with all the features they built on top of it


You don't actually need the central service. In an emergency, a team could work entirely over email via diffs like Linus does.


Yeah, true. In this case, I'm sure most teams did what mine did, and waited for the outage to be resolved. The development workflow and CI/CD is so tightly coupled to the Git remote, it would have taken a while to create and switch to another remote.


I once used another alternative: each member of the team runs git-daemon on their desktop to export their local repository, and adds the repositories exported by the other members of the team as git remotes. You can merge a coworker's master branch to your master branch whenever you want to get their changes (and the changes they already merged), which makes for a chaotic but fun development experience.


I’m not a big Git/GitHub user but presumably distributed version control is still superior to Systems like Subversion that don’t provide a full clone of the repo, as you can still work during GitHub downtime and then use the good branch merging tools to merge back after the outage?


Hmm this is quite an incident, I haven't seen Github reject all pushes to all repos like this in quite some time.... hopefully not SSH key related


Also known as Monday.


It goes down every month like I said before [0]. The last time this happened was 2 days ago [1] then weeks ago [2] and it is evident that it is falling apart in front of us.

First, it was the RSA key leak in [1][3], then the site's key expired causing down time again [2] and now this.

I don't think anyone can tell me with a straight face that GitHub was any more reliable or better when Microsoft acquired it. It is now worse off.

Nothing has changed except for more outages and downtime.

So so reliable. /s

[0] https://news.ycombinator.com/item?id=35004629

[1] https://news.ycombinator.com/item?id=35295216

[2] https://news.ycombinator.com/item?id=35003741

[3] https://github.blog/2023-03-23-we-updated-our-rsa-ssh-host-k...


NB: GitHub was purchased by MS 5 years ago. All these events are in the last month.

It's apparant that it wasn't without issues prior to acquisition (e.g. a quick search for GitHub issues prior to 2018 gives this: https://techcrunch.com/2017/07/31/github-goes-down-and-takes...) - reporting issues in 2017, 2015, and 2012.

I don't have the data to comment on whether it was better before or after MS acquisition, but would suggest this isn't the best sample size to base any conclusions on.


Looks like it's back and the status page reports it as green, however GitHub Actions builds are not triggering on push as usual when this happens, so for me it doesn't seem to be fully recovered yet.


Suppose you start your day by downloading some code from GitHub to work on. This morning you would be stuck. Do you save code on your hard drive or company servers or GitLab to handle this risk?


"GitHub is down" probably wouldn't even make the top 10 stupidest reasons that I couldn't work. Is the idea here that you're downloading brand new code? I would imagine most people would have a stale from Friday copy of the repo locally.


You just clone the repo from one of your colleagues'.


One wonders if MSFT leadership will ever connect the dots between “hollowing out talent over the years,” “hiring freezes,” and “layoffs” to this outcome.

One wonders sometimes if that’s the goal.


I'd long assumed the opposite, but maybe Azure DevOps is the Git web interface they're betting on long-term.


Depends on if it’s at all related


Only code with Github is replicated locally, not everything else like wiki, issues, etc.

That's something I love about Fossil.

Everything with Fossil (wiki, issues, code) is replicated as well.


You can keep issues within git itself: https://github.com/MichaelMure/git-bug


I wonder if it has anything to do with them updating their RSA SSH host key (https://github.blog/2023-03-23-we-updated-our-rsa-ssh-host-k...) 3 days ago.


I feel like events like these are the new "compiling".. https://xkcd.com/303/


Also seeing unicorns, mostly for "No server is currently available to service your request.". If I don't get a unicorn, the page takes a long old time to load.


I used to get my github status updates in slack via an RSS feed, and just searched for the feed again, but its gone? Is there an alternative for this?


Clicking "Subscribe To Updates" top right on https://www.githubstatus.com/ gives me a whole load of options for getting updates, one of those is RSS


Is anyone self-hosting the enterprise version of github as a backup? Or using it as the primary source and then using the cloud version as backup?


> We’ve identified an infrastructure change that has been rolled back and we are monitoring recovery.

From github status


Back up for me


The site is up but git push still doesn’t work for me as of now.


Just found that out the hard way.


I have been getting error when I try git push. I taught the problem is me but I saw this thread.


Ever since Microsoft took over...


...Github has gotten significantly better.


Probably not a coincidence - changes are the #1 reason of outages. Moving faster does mean breaking more things.


...and Github is down...

https://www.githubstatus.com/


Did Github never go down before the acquisition? Would be interesting to see stats.


and up again.


...and out every single month to no end to the outages...


One has to ask oneself... would this have happened without the Microsoft acquisition?


Does anyone have real, contemporary numbers on this? Your observation matches my intuition, and I've seen some stats around the first two years after the acquisition¹, but I don't know any good up-to-date analysis on the question.

My intuition aligns too closely with my known biases here for me to be satisfied with that alone.

1: https://statusgator.com/blog/has-github-been-down-more-since...


Seems it's fixed now?


Why GitHub is down so often? Why is it not possible to keep it up 100% of time (without counting physical failure)?. I haven't seen any down time for my system (it has hundreds of thousands of users online) in months since I have completed the setup.


I truly doubt you are running at the scale of GitHub both in terms of users, complexity, and amount of data.


It’s probably more a thing of dev velocity. If you aren’t changing anything it’s easy to keep your system operational.


I'm sure your scale is similar.


Google search is basically never down.

AWS is basically never down.

WhatsApp is basically never down.

Time for GitHub to grow up?


> Google homepage is basically never down.

The complexity difference between the Google search "app" (not counting the vast indexing infrastructure) and GitHub is also vastly different.

> AWS is basically never down.

Lol what? Have you used AWS?

> WhatsApp is basically never down.

Makes sense, Whatsapp always had a huge focus on reliable infrastructure, since day 0. Pays off I guess :)


I think you are nitpicking. My point is that companies (including Microsoft!) are capable of running large scale infra with much higher uptimes than GitHub. They want to put themselves at the center of our workflows (e.g. GitHub Actions) yet they are not delivering uptimes that are commensurate with that. What is their excuse?


Yeah, I agree with you, bit nitpicky. I also agree that they shouldn't have an excuse, besides confessing their engineering standards are not up to the level of their ambition, which is why I never make anything in my infrastructure depend on anything GitHub, everything that I use GitHub for, I have alternatives setup for the inevitable ill-timed downtime I know will happen.


Great question: Google's homepage revenue is directly 1:1 matches to its uptime. Its user retention is also loosely tied to its uptime, as the value is mostly a replaceable commodity (is Bing worse? sure, but it has results). This leads to the organization investing huge amounts of time and money in ensuring its uptime. I can recall a single outage in the past several years.

On the other hand, GitHub's revenue is mostly monthly/annual licensing and their have great stickiness as it's not trivial to migrate to an equivalent service provider (excluding minor projects who only use a couple of features). They can increase profits through feature development and cost saving, a lot more than through uptime. Is there a limit to this? Of course.


Google loses money when search is down because they can not serve ads. Does Github actually lose money when they are down? I think that because everyone is on subscription they do not lose money by the second, rather instead they lose reputation and long term they could lose customers, but Github's income isn't quite as sensitive to downtime as Googles in general, thus less investment in DevOps in comparison.


RO system is generally probably easier to keep up than a RW system that is constantly innovating.


I Think he was referring to Google Search in general. I've never witness since Google went live in 1998 any Google Search downtime. Probably happened but I can not remember it.


You don't notice when their indexers cannot write; preforming a search is basically RO.


> preforming a search is basically RO

You don't know this. Google results are not the same for all users. How do you know there isn't R/W going on, particularly when signed-in to Google?

(Unless you work at Google on search, in which case I stand corrected!)


I am certain the are normally writes going on; they do run Analytics on their homepage. However they get to defer, retry and play lots of eventually consistent tricks, worst case just swallow the exceptions. The fact they can make the service _seem_ to the end user as fully working whilst being unable to write is a major factor in achieving their world beating reliability.


It is still a relatively complex multi-machine RO operation. It isn't like serving a static site.


Sure, but they can have several copies of the index per datacenter, retry your query multiple times posibly even in a diffrent datacenter. New code and even updated indexes can be tried and then fall back to yesterday's version.


Google Search is still a RO system - you are mostly just retrieving information from a search index.


It is read-online but its read systems are relatively complex and work at scale.


How do you know that?


I don't think GitHub's homepage has gone down at any point during this outage either.


ummm, I guess scale is similar. I am just a single person vs an organization. My google knowledge vs industry expert with years of experience.

My point was not about similar scale though. How hard is it to keep a system up? AWS is a whole universe compared to GitHub, yet it doesn't go down as often as GitHub.


Only GitHub truly knows. But everyone here knows that since Microsoft acquired it, it has degraded to the point where it goes down every month.

It is so frequent and unreliable, you just might as well self-host at this point. You would likely have better up time than GitHub over the past three years since this prediction. [0]

[0] https://news.ycombinator.com/item?id=22867803


I find this strange too. GH seems to have more major incidents lately...


ChatGPT overloading it... (scanning repos)


Maybe they are experimenting with GPT automating devops


Related to the Twitter source code leak, perchance?


Given that looking at repos is just about the only thing that I can do then I'd say that seems unlikely.


I was more implying that this could be a result of an attack borne out of retribution.


Is the implication that someone affiliated with Twitter took down GitHub because of something someone hosted there? Does that honestly seem plausible to you?


I'm not saying "that happened", I'm asking about the possibility. Does it seem plausible that someone sympathetic to Twitter launched an attack on GitHub in retaliation? Absolutely.


Honestly, while it's certainly possible it doesn't seem plausible to me. I have a hard time seeing someone with the extreme emotional connection needed to lash out because of the leak also having the technical sophistication required to bring GitHub down.


ok this makes sense i was just trying to push


Are there downtime safe GitHub alternatives?


You can always sync over local (or remote) ssh or fileshares or even zip up smaller repos, sneakernet them over to those who need them, unpack it there and pull from the unzipped folder.

The pros who maintain git uses email(!), but I think that would take more time than just waiting out the outage.


Codespaces is not connecting anymore for me


Commited code straight from copilot :)


Looks OK from here (Paris, France).


It seems like you can push again !


I cant push anymore! Anyone else ?


Everything is green now!


Time for another coffee


I am getting unicorns!


It’s up now.


seems to work fine for me...?


read-only


Thanks for ruining my day




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: