Step #3 is unfortunately often pointless given that often either:
1. the status page is unavailable too
2. the status page reports the service as green/available even though it's red/down (maybe it's still accessible to the service pinging its health status, maybe it's "accessible" but not actually functional, maybe the engineers were too busy fixing the problem to click the button to update the status page, maybe not updating the status lets them pretend they're within SLAs or KPIs)
I suppose a static html page saying everything is fine hosted on the same infrastructure as your service would be an accurate indicator a good chunk of the time.
Most places don't have an automated status page because of the issues with automation showing outages when they don't exist. Someone manually goes and clicks something on the status page.
This doesn't seem like a good reason. I can't imagine anyone checks githubstatus.com before accessing github.com. People check it after they have issues.
I assume it's more due to contracts, marketing, and denying responsibility.
I also use it for connectivity check. Most other sites have so much ads and cookies it creates long lags with adblock in some older systems I use. So I point to HN and get a really fast response that the web is working
Except for ISPs. Whenever a major website goes down, people blame their ISPs. When their own provider goes down, they don't remember which one they use and blame every other one in their region as well.
Now's a good opportunity to ask about alternative Git hosts. What other services do HNers use?
I've been unwilling to host any personal projects on GH after Copilot launched and it because clear that GH/MS doesn't really respect the authors of the code they host. Honestly open source in general has gotten a little less compelling to me after Copilot. The recent security issue at GH has also turned me off even more on Git hosting services.
For closed source projects maybe it's just best to store encrypted backups off site and spin up a self hosted option whenever collaboration is needed. Seems pretty inefficient though.
My company uses a self-hosted Gerrit instance. A bit of a learning curve, but the code review experience is SO much better than PRs (one commit == one unit of review, clean intra-diffs between different revisions of a patch, stacking of reviews simply by having multiple commits on a branch, UI is very snappy and responsive...).
Self-hosting Gerrit is easy[1] because all its internal state is fully transactional, including reviews and configs, and is simply stored as normal Git commits inside the Git repos on the filesystem.
In fact, our instance had much better uptime over the past year than GitHub despite being migrated to another server once!
[1]: ... unless you need a complex high availability setup or replicas. But 99% of projects are fine with just a single instance and backups.
Gerrit looks ugly, and the admin story is a nightmare, but it's such a better experience for code reviews (plus you can actually enforce code review in a way Github cannot). I can't stand dealing with Github for code review after spending a few years using Gerrit.
Haven't tried this one but Reviewable and Graphite. They're all very nice and yours looks nice also. The problem with all these third party SaaS-es on top of GitHub:
- You have to trust a random (no offense meant!) SaaS company with full access to your repositories, and to not disappear in a year or two.
- GitHub API rate limits end up causing issues sooner or later. For instance, Reviewable would randomly break and ask you to add more admin users so it could load balance API requests across multiple accounts!
- Likewise, you are still forced into the PR model and things that are trivial in Gerrit, like stacked diffs, are still hard. spr helps[1], but at that point you are piling workarounds on top of workarounds, might as well use a tool that supports the workflow natively...
- It gets messy unless 100% of the team is using it because then you have to somehow sync comments and approvals back and forth... And getting 100% of the team to use it isn't much easier than convincing them to use Gerrit, with all of the downsides.
All great points, and I appreciate them because I think CodeApprove's marketing materials should address them more head on.
The first two (trust with repos and GitHub API rate limits) don't really apply to CodeApprove in the same way they do to Reviewable. Because we're using GitHub's newer "apps" system and not OAuth, we only have very limited scopes (can't write code, etc.) and API rate limits go up as we get more users.
Your points about stacked diffs and PR adoption stand!
Good API too. With some Powershell magic I added commit checks which get their status from TeamCity (you can also use Gitea actions but that doesn't work on Windows).
Then use Renovate and Google OSV scanner as a replacement for Dependabot and Github Advanced Security.
If you're into self hosting, soft-serve is a really cool CLI based git server from Charmbracelet that is about a half step above a bare git repo over SSH
That is pretty slick. Not having to deal with file permissions when limited access to collaborators sounds nice. Can't say I haven't messed that up before.
I have a vps with bare git repos, pushing via ssh. A selection of these has hooks set up to mirror all pushes onto github/gitlab/$platform, but these are write-only.
> I have a vps with bare git repos, pushing via ssh. A selection of these has hooks set up to mirror all pushes onto github/gitlab/$platform, but these are write-only.
I like it, self-hosting bare git repos has been pretty painless for me in the past on LANs. You could still add a hook to encrypt the repo and backup whenever you merge to dev or something as well.
You pretty much only lose the hosted diff/review/ticketing tools which I've never enjoy much regardless.
I've been very happy with SourceHut (git.sr.ht). It's fast and slick. Does what you need, none of the cruft. Among other things, I really like: I can push to any URL under my username and it creates a repo automatically, and that it reminds me to put a license file on public repos.
I really like how the author is open about both the development and the business side.
At my previous work almost everything was self-hosted including Git and servers to run machine learning models. The only exception was Jira. The company owned the servers and rented the space in a datacenter. For code reviews we had Critic.
At the new work we use BitBucket, but for reviews its UI is strictly worse than Critic. And on top of it it is strictly more expensive than self-hosting experience including paying for a competent sysadmin.
I understand that cloud allows to offload a lot of headaches, but I really see no point in using cloud services for development. Even for a small company a dedicated server with another to spare in case of failures will be cheaper and it’s administration will be trivial.
I'm migrating from GitHub to CodeCommit. My project has pretty strict security guidelines, and GitHub doesn't have a high enough accreditation. GitHub tries to handwave this by saying good current practice means there's no personally identifiable information, etc. in Git repositories, but I'm not willing to entrust the code behind moderately sensitive infrastructure to a service that thinks they don't need to implement (and more importantly, prove they implement) better security standards and practices.
Recent developments have only reinforced my feelings on this matter.
I feel like HN could just have a traffic monitor that adds a little icon to the main page. Like "I dunno what's up, but there's a LOT of y'all here right now, so something probably is".
Same here haha. StackOverflow informed me that it means GitHub is down, and indeed it was.
I have to wonder if Git could somehow report this better. I guess it depends on exactly how GitHub is down, but "fatal error in commit_refs" made me worry that my local repo was somehow hosed.
It reports what it has, if it manages to connect but fetching the metadata or whatever fails, that’s what it’s going to report.
If it can’t even connect it’ll tell you that, but I would assume on github the client will always manage to connect unless their entire network is down.
Seems better done than a lot of status pages, runs on separate infra, updated, has a way to subscribe, etc.
However, saying "degraded performance" when you know it's "down for everyone" is an industry phrasing thing that's irritating. AWS also has "elevated response times" when everyone is seeing 5xx errors, or infinite response times.
> However, saying "degraded performance" when you know it's "down for everyone" is an industry phrasing thing that's irritating. AWS also has "elevated response times" when everyone is seeing 5xx errors, or infinite response times.
Another popular one is "elevated API error rates" when the error rate is 1.
having been on the other side a number of times at a site with huge amounts of traffic, very often things can be down for a huge percentage but our logs will still show thousands of requests succeeding a minute. so it might be working well but slowly for some while not at all for many
Because if you don't spend multiple days per year managing your own "backup" to GitHub, you might be left unable to push to GitHub for a few hours per year?
btw, I hope none of your CI system relies on build steps that might include pulling code from GitHub or downloading packages from GitHub Package registry. Often when GitHub is down, my CI system on GitLab is broken too.
I like to reduce the single point of failure count in my projects as much as I can, with common sense. My backup system is running a few cli commands instead of git push, so it's a no brainer. YMMV.
I understand the point you're making here, but I feel it is being made in an effort to prove hakanderyal technically wrong rather than to evaluate the practicality of single points of failure, which is what they're trying to promote. I this this conversation would be much more helpful and insightful if it was kept on that evaluation track rather than trying to have a final word.
I don't think "GitHub is down" threads are known for the quality of their conversation, but sure.
I also agree with reducing (increasing?) single points of failure. I'm not trying to be pedantic, but rather observing that in practice, it's not nearly as easy as spinning up a backup Git server (which is already hard enough).
Maintaining two classes of build infrastructure throughout all your dependencies is probably not a worthwhile problem to solve, unless you want to control for the improbable risk that GitHub will be down for weeks at a time. You'd be much better off ensuring that you are able to perform rollbacks without needing to pull from the external world, because this way the worst case scenario is you run a stale version for the time that GitHub is down, in the off chance you pushed a broken version right before the outage.
True, but there are other ways to go about it. Rather than trying to challenge someone's setup, which (I assume, apologies if incorrectly) you aren't familiar with, you can start by asking them about the setup and how they keep it independent from GH. Let them expose either the success of the endeavor or its shortcomings. Such an approach is a constructive one, whereas trying to challenge someone is an antagonistic approach. We're all in this together, so let's keep our discussions constructive, and focus on learning from each other rather than try to tear each other down.
Because I use Github only for git hosting and actions (building & deploying containers). In this case, code is already in my computer, and I can build and ssh to server and pull&run the new container.
That assumes that none of your Docker images have a build step that includes interacting with GitHub, or if some do, that you have every affected layer already cached on your local computer.
There is the GitHub Status Page [0], which doesn't display aggregate stats, but you could scrape it and do the analysis.
I suspect what you're getting at is that the downtime might evaluate to multiple days over the course of the year. Maybe that's true, idk. I'd be curious, but you'd probably want to do the analysis separately for different services (e.g. Actions vs. Package Registry vs. Git outages all have different effects on build infrastructure downtime).
Depends on the context. I need to push to a repo and QA needs to test the app, but since I can't push, they can't. No other practical way (other than setting up another origin on Bitbucket or GitLab, or emailing/zipping sending etc).
"...The next major issue that people encounter is that they need to collaborate with developers on other systems. To deal with this problem, Centralized Version Control Systems (CVCSs) were developed. These systems (such as CVS, Subversion, and Perforce) have a single server that contains all the versioned files, and a number of clients that check out files from that central place. For many years, this has been the standard for version control."
"...However, this setup also has some serious downsides. The most obvious is the single point of failure that the centralized server represents. If that server goes down for an hour, then during that hour nobody can collaborate at all or save versioned changes to anything they’re working on. If the hard disk the central database is on becomes corrupted, and proper backups haven’t been kept, you lose absolutely everything..."
"...This is where Distributed Version Control Systems (DVCSs) step in. In a DVCS (such as Git, Mercurial, Bazaar or Darcs), clients don’t just check out the latest snapshot of the files; rather, they fully mirror the repository, including its full history. Thus, if any server dies, and these systems were collaborating via that server, any of the client repositories can be copied back up to the server to restore it. Every clone is really a full backup of all the data...."
Git avoids the problem where the central service being down gets in the way of local development. Even in Git, that central service going down means degraded collaboration.
Normally a Git remote is just an ssh-accessible machine, and so pretty resilient.
But GitHub is a lot more complex, so apparently that simple service went down, along with all the features they built on top of it
Yeah, true. In this case, I'm sure most teams did what mine did, and waited for the outage to be resolved.
The development workflow and CI/CD is so tightly coupled to the Git remote, it would have taken a while to create and switch to another remote.
I once used another alternative: each member of the team runs git-daemon on their desktop to export their local repository, and adds the repositories exported by the other members of the team as git remotes. You can merge a coworker's master branch to your master branch whenever you want to get their changes (and the changes they already merged), which makes for a chaotic but fun development experience.
I’m not a big Git/GitHub user but presumably distributed version control is still superior to Systems like Subversion that don’t provide a full clone of the repo, as you can still work during GitHub downtime and then use the good branch merging tools to merge back after the outage?
It goes down every month like I said before [0]. The last time this happened was 2 days ago [1] then weeks ago [2] and it is evident that it is falling apart in front of us.
First, it was the RSA key leak in [1][3], then the site's key expired causing down time again [2] and now this.
I don't think anyone can tell me with a straight face that GitHub was any more reliable or better when Microsoft acquired it. It is now worse off.
Nothing has changed except for more outages and downtime.
I don't have the data to comment on whether it was better before or after MS acquisition, but would suggest this isn't the best sample size to base any conclusions on.
Looks like it's back and the status page reports it as green, however GitHub Actions builds are not triggering on push as usual when this happens, so for me it doesn't seem to be fully recovered yet.
Suppose you start your day by downloading some code from GitHub to work on. This morning you would be stuck. Do you save code on your hard drive or company servers or GitLab to handle this risk?
"GitHub is down" probably wouldn't even make the top 10 stupidest reasons that I couldn't work. Is the idea here that you're downloading brand new code? I would imagine most people would have a stale from Friday copy of the repo locally.
One wonders if MSFT leadership will ever connect the dots between “hollowing out talent over the years,” “hiring freezes,” and “layoffs” to this outcome.
Also seeing unicorns, mostly for "No server is currently available to service your request.". If I don't get a unicorn, the page takes a long old time to load.
Does anyone have real, contemporary numbers on this? Your observation matches my intuition, and I've seen some stats around the first two years after the acquisition¹, but I don't know any good up-to-date analysis on the question.
My intuition aligns too closely with my known biases here for me to be satisfied with that alone.
Why GitHub is down so often? Why is it not possible to keep it up 100% of time (without counting physical failure)?. I haven't seen any down time for my system (it has hundreds of thousands of users online) in months since I have completed the setup.
I think you are nitpicking. My point is that companies (including Microsoft!) are capable of running large scale infra with much higher uptimes than GitHub. They want to put themselves at the center of our workflows (e.g. GitHub Actions) yet they are not delivering uptimes that are commensurate with that. What is their excuse?
Yeah, I agree with you, bit nitpicky. I also agree that they shouldn't have an excuse, besides confessing their engineering standards are not up to the level of their ambition, which is why I never make anything in my infrastructure depend on anything GitHub, everything that I use GitHub for, I have alternatives setup for the inevitable ill-timed downtime I know will happen.
Great question: Google's homepage revenue is directly 1:1 matches to its uptime. Its user retention is also loosely tied to its uptime, as the value is mostly a replaceable commodity (is Bing worse? sure, but it has results). This leads to the organization investing huge amounts of time and money in ensuring its uptime. I can recall a single outage in the past several years.
On the other hand, GitHub's revenue is mostly monthly/annual licensing and their have great stickiness as it's not trivial to migrate to an equivalent service provider (excluding minor projects who only use a couple of features). They can increase profits through feature development and cost saving, a lot more than through uptime. Is there a limit to this? Of course.
Google loses money when search is down because they can not serve ads. Does Github actually lose money when they are down? I think that because everyone is on subscription they do not lose money by the second, rather instead they lose reputation and long term they could lose customers, but Github's income isn't quite as sensitive to downtime as Googles in general, thus less investment in DevOps in comparison.
I Think he was referring to Google Search in general. I've never witness since Google went live in 1998 any Google Search downtime. Probably happened but I can not remember it.
I am certain the are normally writes going on; they do run Analytics on their homepage. However they get to defer, retry and play lots of eventually consistent tricks, worst case just swallow the exceptions. The fact they can make the service _seem_ to the end user as fully working whilst being unable to write is a major factor in achieving their world beating reliability.
Sure, but they can have several copies of the index per datacenter, retry your query multiple times posibly even in a diffrent datacenter. New code and even updated indexes can be tried and then fall back to yesterday's version.
ummm, I guess scale is similar. I am just a single person vs an organization. My google knowledge vs industry expert with years of experience.
My point was not about similar scale though. How hard is it to keep a system up? AWS is a whole universe compared to GitHub, yet it doesn't go down as often as GitHub.
Only GitHub truly knows. But everyone here knows that since Microsoft acquired it, it has degraded to the point where it goes down every month.
It is so frequent and unreliable, you just might as well self-host at this point. You would likely have better up time than GitHub over the past three years since this prediction. [0]
Is the implication that someone affiliated with Twitter took down GitHub because of something someone hosted there? Does that honestly seem plausible to you?
I'm not saying "that happened", I'm asking about the possibility. Does it seem plausible that someone sympathetic to Twitter launched an attack on GitHub in retaliation? Absolutely.
Honestly, while it's certainly possible it doesn't seem plausible to me. I have a hard time seeing someone with the extreme emotional connection needed to lash out because of the leak also having the technical sophistication required to bring GitHub down.
You can always sync over local (or remote) ssh or fileshares or even zip up smaller repos, sneakernet them over to those who need them, unpack it there and pull from the unzipped folder.
The pros who maintain git uses email(!), but I think that would take more time than just waiting out the outage.