> when they should be able to kick the app over the wall and let infra ensure th...

SilasX · on Nov 2, 2021

If that's the point:

A) It's doing a horrible job conveying it. A dev does need to be concerned on how to handle failover, but only at a certain abstraction level. They should be required to specify something in the form "given server A fails and has to pass to B, what do you do?" That does not require you to know the terminology about PCRs and how to make decisions about which cells (or whatever) to pick on deployment, or avoiding the "gotcha" about making sure the two servers are in different PCR zones.

At that point, it's just following a checklist that needs no knowledge of the specifics of the app, and, to the extent that it's accurately representing how Google was, is indicative of bad processes.

B) Many things should be infra's job, as they're cleanly orthogonal to what dev's are doing. For example, how to apply a security patch to a DB. That's unrelated to the operation of the app.

I do get your point though, and I wouldn't say something like this about e.g. testing (which was the short, "reasonable" part of the video!) -- the devs have intimate knowledge of what counts as passing and failing and should be writing tests, and not 100% passing it over to QA. But that's precisely because such concerns are deeply tied in to the thing they are concerned with. "SQL 3.4.1 vs 3.4.2" is not.

q3k · on Nov 2, 2021

Yeah, it seems like we agree :).