That simple script is still someone's bad day

throwaway928473 · on April 6, 2022

This post is rather cryptic but I assume Rachel is talking about the recent Cloudflare post that describes an outage caused by a falling shell pipeline:

https://blog.cloudflare.com/pipefail-how-a-missing-shell-opt...

“The fact we're talking about shell scripts for something critical means that the battle for reliability was lost a long time ago.”

This is a stretch. While I’ll quickly advocate for writing anything more than the most trivial script in something better than shell it doesn’t follow that using shell at all means reliability is not a concern. One could easily forget to check the exit code of the first program if the shell script was rewritten in C++, for instance. People are always going to make mistakes and saying “never use shell”, especially for a basic two-command pipeline, isn’t going to move the needle on reliability anywhere.

trasz · on April 5, 2022

“The fact we're talking about shell scripts for something critical means that the battle for reliability was lost a long time ago.”

Except that when replaced, the new thing is usually even worse for reliability. Case in point: Jenkins. Or anything “enterprise”.

patrick451 · on April 6, 2022

Strange example. We use Jenkins at work with somewhere around 100 workers. I can't remember the last time Jenkins itself failed. Nodes in a bade state because they are too much like pets? Sure. But the Jenkins master is rock solid.

trasz · on April 6, 2022

With enough thrust pigs will fly - sure, you can work around Jenkins' deficiencies, but in general it's optimized for shooting yourself in the foot. Everything it does can be done with a small shell script, which will be easier to understand, maintain, and fix if it ever goes wrong.

And if you use Jenkins in a maintainable way, you're writing scripts anyway - except you have to use a made-up language instead of something standard.

patrick451 · on April 7, 2022

Some examples for this hyperbole would be helpful. We aren't doing anything in particular to work around whatever deficiencies you're talking about. It just works.

Maybe our definition of a "small" shell script are different, but to me the following very useful features sounds like a monstrously complex shell script:

- Load balance builds over scores of distributed build servers

- Present a UI where I can glance over recent CI results.

- Expose REST api that my ETL pipeline can query.

- Let devs manually run integration tests against their branch without providing them ssh access