> This is a failed cert, which is a failure that can be reasonably easily handle...

tialaramex · on July 4, 2021

Yes, this kills Yeti 2022. There's a bug referenced which refers to an earlier incident where a bitflip happened in the logged certificate data. That was just fixed. Overwrite with the bit flipped back, and everything checks out from then onwards.

But in this case it's the hash record which was flipped, which unavoidably taints the log from that point on. Verifiers will forever say that Yeti 2022 is broken, and so it had to be locked read-only and taken out of service.

Fortunately, since modern logs are anyway sharded by year of expiry, Yeti 2023 already existed and is unaffected. DigiCert, as log operator, could decide to just change criteria for Yeti 2023 to be "also 2022 is fine" and I believe they may already have done so in fact.

Alternatively they could spin up a new mythical creature series. They have Yeti (a creature believed to live in the high mountains and maybe forests) and Nessie (a creature believed to live in a lake in Scotland) but there are plenty more I'm sure.

ajross · on July 4, 2021

It doesn't break anything that I can see (though I'm no expert on the particular protocol). Our ability to detect bad certs isn't compromised, precisely because this was noticed by human beings who can adjust the process going forward to work around this.

Really the bigger news here seems to be a software bug: the CT protocol wasn't tolerant of bad input data and was trusting actors that clearly can't be trusted fully. Here the "black hat" was a hardware glitch, but it's not hard to imagine a more nefarious trick.

zinekeller · on July 4, 2021

Your statement is, to be frank, non-sensical. The protocol itself isn't broken, at least for previous Yeti instances, certificate data are correctly parsed and rejected.* In this instance, it seems that the data is verified already pre-signing BUT was flipped mid-signing. This isn't the fault of how CT was designed but rather a hardware failure that requires correction there. (Or at least that's the likely explanation, it could be a software bug° but it will be a very consistent and obvious behaviour if it is indeed a software bug.)

On the issue of subsequent invalidation of all submitted certificates, this is prevented by submitting to at least 3 different entities (as of now, there's a discussion whether if this should be increased), so if a log is subsequently found to be corrupted, the operator can send a "operator error" signal to the browser, and any tampered logs are blacklisted from browsers. (Note that all operators of CT lists are members of CA/B forum, at least as of 2020. In standardisation phase, some individuals have operated their own servers but this is no longer true.)

* Note that if the cert details are nonsensical but technically valid, it is still accepted by design, because all pre-certificates are countersigned by the intermediate signer (which the CT log operator checks from known roots). If the intermediate is compromised, then the correct response is obviously a revocation and possibly distrust.

° At least the human-induced variety, you could say that this incident is technically a software bug that occurred due to a hardware fault.

caf · on July 4, 2021

Presumably it's possible to code defensively against this sort of thing, by eg. running the entire operation twice and checking the result is the same before committing it to the published log?

makomk · on July 4, 2021

Big tech companies like Google and Facebook have encountered problems where running the same crypto operation twice on the same processor deterministically or semi-deterministically gives the same incorrect result... so the check needs to be on done on separate hardware as well.

caf · on July 4, 2021

I don't think that matters in this case, because the entire point of the log machine is to run crypto operations. If it has such a faulty processor it is basically unusable for the task anyway.

ajross · on July 4, 2021

So I'm learning about Yeti for the first time, but I don't buy that argument. Corrupt tranmitted data has been a known failure mode for all digital systems since they were invented. If your file download in 1982 produced a corrupt binary that wiped your floppy drive, the response would have been "Why didn't you use a checksumming protocol?" and not "The hardware should have handled it".

If Yeti can't handle corrupt data and falls down like this, Yeti seems pretty broken to me.

tptacek · on July 4, 2021

Not handling corrupted data is kind of the point of cryptographic authentication systems. Informally and generally, the first test of a MAC or a signature of any sort is to see if it fails on arbitrary random single bit flips and shifts.

The protocol here seems to have done what it was designed to do. The corrupted shard has simply been removed from service, and would be replaced if there was any need. The ecosystem of CT logs foresaw this and designed for it.

ajross · on July 5, 2021

So... Yeti isn't broken then? Seems like the protocol does handle it? Seems like there's some confusion on this point in this thread.

detaro · on July 5, 2021

The Yeti2022 log is corrupted due to the random event. This has been correctly detected, and is by design and policy not fixable, since logs are not allowed to rewrite their history (ensuring that they don't is very much the point of CT). That the log broke is annoying but not critical, and the consequences are very much CT working as intended.

You can argue if the software running the log should have verified that it calculated the correct thing before publishing it, but that's not a protocol concern.

tptacek · on July 5, 2021

I think Yeti2022 is just the name for one instance of the global CT log? Nick Lamb could probably say more about this; I understand CT mostly in the abstract, and as the total feed of CT records you surveil for stuff.

detaro · on July 5, 2021

Yeti is one of the CT logs (and Yeti2022 the 2022 shard of it, containing certs that expire in 2022). CT logs are independent of each other, there is not really a "global log", although the monitoring/search sites aggregate the data from all logs. Each certificate is added to multiple logs, so the loss of one doesn't cause a problem for the certs in it. (Maybe it's also possible to still trust Yeti2022 for the parts of the log that are well-formed, which would decrease the number of impacted certs even more, not familiar enough with the implementations for that)