Yes they do, because very rarely would such a healthcheck kind of setup actually work in practice, at a large enough size, for a user-facing dashboard. If you want a healthcheck, look at a Grafana dashboard, not a status page.
By the way, I don't know of a single place where this isn't the case, where a human signs off on and updates the status page during large events (at least at the final decision.) Some of it will be automated, sure, like red flags being raised to operators. But at a certain point it is not possible to automate this in some level to achieve second-level accuracy or whatever; the system is rarely (if ever) in a binary state of "working perfectly" or "not working", but somewhere in between. You can't just fire off a big red error bar every time a blip occurs at a place like GitHub. The system is constantly "in motion". The logical conclusion is to just expose your 50+ Grafana dashboards publicly to every user. Isn't that the most honest "overview" of what is happening with your product? Except this often can't tell them useful things either.
People on here will also mumble about SLAs but if a customer wants a kickback or is seriously worried about events like these, they're generally talking to account managers, not posting on internet forums. That said, a lot of them get weaselly about that stuff unless you're already negotiating prices with an AM in the first place...
When I started work at Amazon in 2001 we had a "gonefishing" page for outages that a human had to flip the site to manually.
We actually stopped doing this a year or two later because reporters were setting up monitoring on that page showing up and were reporting on outage statistics based on it. So we just left the site up in whatever degraded state it was in and that made the problem of measuring www.amazon.com uptime externally that little bit more difficult.
Probably requires manual updates. It seems like more and more places have moved to this paradigm now that status pages are tied to SLAs which are tied to money. One might call it the politicization of status pages.
Politicization, yes; I've never heard of SLAs being tied to status pages. It is like pulling teeth to get most cloud providers to credit the account when they don't meet SLA, and one always has to ask for it; heaven forbid if credits were paid out automatically when service wasn't rendered.
Or you get weasel-worded out of it. I had a cloud provider deny a service credit; the SLA stated that the service was only out of SLA if it didn't return 2xx. Well, the API returned "2xx Accepted — your request is being processed", and you could use the API to query the job, and the job … never finished or made any progress at all. But the API returned 2xx the entire time, so that was "within SLA".
Correct. SLAs aren't calculated off status pages, there are far better ways of calculating it (running a query over responses, for example). Most modern SLAs are customer initiated anyways, so the customer is writing in to request this rather than automatically calculating them. The status page doesn't need to show anything for a customer to provide logs indicating a QoS less than that promised in the SLA.
I don't think it's politics (maybe AWS's is, but GCP wasn't IMO), it's really a function of "in large scale software systems things are constantly failing in all sorts of ways, and it's really hard to output a meaningful automated signal that things are broken. Sure you can set up pingdom type health checks on every endpoint, but even then you're not necessarily guaranteeing that things are working properly.
Source: worked at a few cloud providers, paid out a few SLA violations
I can't imagine the status page is mentioned anywhere in the contracts. I would think that the only language that is included is that it's the customers responsibility to track and notify the vendor (Github) about any downtime, and to do that within 48 hours of the downtime having occured. And only then will the customer be eligible for compensation, probably in the form of free service credits.
My personal take is that this wording is the reason you see so few public status pages in general, especially ones with automatic and minute-by-minute status history. Better to put it on the customer to have accurate monitoring in place, which most people simply won't have.
It seemed like a good faith effort... Between when I was told about it (which seemed like it was very early on) to when I saw the status page change (saw it 'all clear' at first) was... 3 minutes? Yes, it's not 'real time' but they don't seem to be intentionally hiding something for hours on end. There may be some consequences for reporting on false positives too quickly as well.
2) The SLA isn't tied to the status page at all, a customer can request an SLA refund for any reason provided they have proof of damages.
TL;DR: status pages are a helpful tool to let folks know something's wrong and that the team isn't asleep at the wheel, not a legally binding contract.
Not a lawyer, but my understanding of this issue is that it constitutes fraud if and only if the status page is a contractually- or legally-binding warrant as to the status of their services.
>why is it legal in the tech industry?
Tech has a history of abusing legal gray areas or else simply ignoring laws it finds inconvenient, enabled by toothless and sluggish enforcement, and using profits from lawbreaking to fund lobbying campaigns to retro-actively make things legal. For recent examples see Uber, Airbnb, Clearview, et al.
github.com is having issues across the site, the API including git operations (and CLI) still work. Status page is manually updated, and we're working to get it updated.
EDIT: it's updated now.
EDIT EDIT: github.com is back up and running, apologies for the disruption :(
Not many places I have worked for allowed for this to be automatic. A lot of it was so they could provide a coherent explanation as to what the current state of internal attention was directed at vs what everyone can plainly see.
I guess this makes sense, but I don't understand why you couldn't have it change status automatically, and still allow a person to go in after and manually add an explanation.
Mostly because it's more work to manually remove a bunch of spurious robotically added statuses (which look bad, because you're either down more than customers are noticing or your detection is flawed) than it is to manually add and remove real ones.
I think it is so they can "hide" small outages that don't rise to the level of making the news sites so they can look better. A lot of sites do this sort of thing these days.
When I worked at GCP it was all manually updated as well so we could add a sentence about what was actually affected. In any sufficiently large system it's hard to indicate exactly what's broken/how to work around it, so it was just easier to `/status <system> <color> <reason for status>`.
I just refreshed a page from GH that I've had open since last night, and yup, 500. Of course I came to HN first before even visiting their own status page as HN always has an update faster than their official page.
Good luck to the engineers at GitHub as I know how stressful it can be, but hope everyone else is enjoying a nice break and some socializing