They have this famous one - http://spinroot.com/gerard/pdf/P10.pdf (https://dev....

jkaptur · on Aug 29, 2020

I was nodding along to this, but this gave me pause:

> Assertions must always be side-effect free and should be defined as Boolean tests. When an assertion fails, an explicit recovery action must be taken, e.g., by returning an error condition

> ... Because assertions are side-effect free, they can be selectively disabled after testing in performance-critical code.

Why would you put in recovery logic in if you're just going to disable it?

For example, let's say that I've got a new algorithm for calculating the cosine. Just to be safe, I add an assertion that the return value is in [-1, 1], and return an error if it isn't.

Now the clients of my code (say, the guidance computer) deal with the error somehow: if cos(x) returns an error, show an error to the pilot and maintain course or something.

If that assertion were stripped out in production code, then there was no point in writing the recovery logic. I've never written safety-critical code - am I missing something?

colechristensen · on Aug 29, 2020

You might be misled by the use of the word “recovery”, returning an error condition in many areas of performance critical code while running in production just results in a loss of vehicle, there’s no coasting in a rocket launch, bugs usually mean explosions.

There are also many time critical code paths where microseconds matter.

Think of assertions in lots of places more like unit tests, important to verify your code but unhelpful in production.

Failure recovery can be whole entire systems for production which are much different than assertions.

jedimastert · on Aug 29, 2020

> For example, let's say that I've got a new algorithm for calculating the cosine. Just to be safe, I add an assertion that the return value is in [-1, 1], and return an error if it isn't.

I believe the idea here is that, if you've tested your code correctly, the assertions are never triggered and therefore your constraints are met and you don't need the checks anymore.

Somewhat interestingly, the Chromium code base does something similar. There are `DCHECK` macros everywhere that are assertions that crash if the condition is false (for instance if a variable is null or some such) but they're disabled in production builds

stjohnswarts · on Aug 31, 2020

Are you sure they aren't turned off for performance reasons?

jedimastert · on Aug 31, 2020

Also yes

richard_todd · on Aug 29, 2020

If your code is swinging a multi-million-dollar robot arm near a multi-million-dollar mirror assembly, for example, it’s probably not seen as excessive to add recovery code even if some of those assertions will only be active during testing phases.

throwaway_pdp09 · on Aug 29, 2020

Seems odd also. My understanding of assertions is if they fail, there really can be no recovery because you've reached an impossible state (impossible = this should not ever have happened).

jacobwilliamroy · on Aug 29, 2020

Not exactly. For example during an automated download, if an http status code is not 200, it's perfectly fine to yield, delete the incomplete data from the failed download, and try the download again. Same thing with checking data integrity. If such fails, one can simply delete, redownload, rehash, and set a variable upper bound on the number of retries.

Of course nothing in space, keeping astronauts alive, should be using the web.

heavenlyblue · on Aug 30, 2020

Assertions are not the errors which you’re supposed to recover from. If you ended up in the state that triggers an assertion it means that you’re in the rabbit’s hole and the Cheshire Cat is speaking your you. Also you can divide by zero. Halting problem is solvable in finite time. Pi is not transcendent.

stjohnswarts · on Aug 31, 2020

I would hope they might try to recover from it if I'm in orbit and it's the system I need to get back down to earth. At least give it that old college try instead of just saying "welp you're dead"

heavenlyblue · on Sept 2, 2020

Assertions are usually the stuff similar to “I have just added a key to a dictionary, now I look it up again to get it’s owned-by-dictionary address - what do I do if it doesn’t exist now?”

In such scenarios you can only kill the process - there’s no way to recover from such an error and handling it in any way but by disabling the subsystem (or killing the process) is impractical.

amenghra · on Aug 30, 2020

You mean astronauts shouldn’t have iPads they need to put in “airplane” mode when the cached data gets corrupted? cough cough

tester756 · on Aug 29, 2020

I think keyword here is

>performance-critical code.

jedimastert · on Aug 29, 2020

Rule number one includes no recursion, which I get but is also really interesting to me. I feel like I could turn any recursive algorithm I'd use in production into a loop, but I feel like it would trip me up

AnimalMuppet · on Aug 30, 2020

Add rule 2 (fixed upper bounds on loops), and you see that you're restricted to a certain type of algorithm. You can't build something that, say, walks an arbitrary-sized binary tree - not and be within the rules. But then, you can't create an arbitrary-sized binary tree, because you can't allocate memory after startup.

These rules are for guaranteed-response-time algorithms that take data from a fixed number of places, do computations on it using fixed-size buffers, and write the data to a fixed number of places. They aren't for writing general Turing-complete computations in all their variety.

ibrault · on Aug 30, 2020

In addition to what AnimalMuppet said (fixed bounds on loops), you also have to be careful to avoid blowing the stack with limited memory + lots of running processes

stjohnswarts · on Aug 31, 2020

often you have a limited stack and recursions can easily blow the stack unless you're very very careful, particularly in embedded situations. Loops are a bit better for that.