Undefined behavior is not the only kind of bug in C programs, and it's far from ...

pron · on Oct 23, 2019

It's pretty much the only kind of bugs that Rust can prevent, too.

> and it's far from clear that fixing all the bugs, or even all the undefined behavior, in an existing C library will be less effort than rewriting it

It's pretty clear to me.

> Consider that one of the worst security bugs in history

I'm not sure what bug you're referring to, but if it's Heartbleed, than that was an undefined behavior bug. Of course, a functional bug can be introduced at any time, including during a rewrite in another language.

kragen · on Oct 23, 2019

No, I'm talking about the Debian OpenSSL bug Luciano Bello discovered. It was a lot worse than Heartbleed. Kurt Roeckx didn't introduce Heartbleed, and Heartbleed wasn't introduced by removing undefined behavior, so there is no plausible reason for you to infer that I was talking about Heartbleed.

As for the cost of rewrites, there's a lot of evidence from software project metrics that the cost of modifying software can easily exceed the cost of rewriting it; see Glass's Facts and Fallacies of Software Engineering for details and references. Also, though, it should be intuitively apparent (though perhaps nonobvious) that this is a consequence of the undecidability of the Halting Problem and Rice's Theorem — it's impossible to tell what a given piece of software will do, which means that the cost of reproducing its existing behavior in well-understood code is unbounded.

pron · on Oct 23, 2019

> so there is no plausible reason for you to infer that I was talking about Heartbleed.

Except that Heartbleed is the only OpenSSL bug I've heard of :) Also, I don't know who Kurt Roeckx is.

> there's a lot of evidence from software project metrics that the cost of modifying software can easily exceed the cost of rewriting it

But we're not talking about arbitrary modification, but about, at worst, fixing undefined behavior, which requires only local modifications (or Rust wouldn't be able to prevent that either). As an ultimate reduction, you could choose to rewrite the software in C and still use sound static analysis to show lack of undefined behavior.

> which means that the cost of reproducing its existing behavior in well-understood code is unbounded.

Yes, but that still doesn't mean that a rewrite is cheaper. Also, while your conclusion is correct, your statement of Rice's theorem is inaccurate: it's impossible to always tell what ever piece of software would do. It's certainly possible to tell what some software would do, at least in some cases, or writing software would be impossible to begin with.

BTW, if you're interested in the theory of software correctness, you might be interested in this blog post of mine, that lists relevant results: https://pron.github.io/posts/correctness-and-complexity

kragen · on Oct 24, 2019

I appreciate your clarification! Indeed, I didn't mean it was impossible to tell what any software would do in any situation, only some software (in practice, nearly all) in some situations. The contrary would imply that not only writing software but also running it would be impossible.

Your blog post looks very interesting indeed! I will read it with care.

I do think there's a subtle point about modifying software. Not just any modification of the software that lacks undefined behavior will do; we want a modification that preserves the important aspects of the original software’s behavior. Not only is this easy to get wrong—as shown spectacularly by the OpenSSL bug (which you've presumably looked up by now), but also, for example, by the destruction of the first Ariane 5—but there is no guarantee that it can be done with purely local modifications, even if the final safety property you wanted to establish can be established with chains of local reasoning.

I do agree that sound static analysis of C that is written to make that analysis tractable is just as effective as rewriting in Rust. Not only can such analysis show the absence of undefined behavior, it can show arbitrary correctness properties, including those beyond the reach of Rust’s type system. Probably the strongest example of this kind of analysis is seL4, although now its proofs verify not only the C but also the machine code, thus eliminating the compiler from the TCB.

pron · on Oct 25, 2019

Yes, I looked up the OpenSSL bug you referred to, and I think it's quite unusual. I'm not sure what those lines were exactly, but from the description it seems like it was intended to read uninitialized memory, something that (safe) Rust won't let you do, either. Also, it's probably wrong even in C, but it worked. So yeah, touching code in any way is not always 100% safe, but my point was just that sound static analysis is still cheaper than a rewrite, as it requires far less modification.

As to seL4, it isn't exactly similar to sound static analysis, as the work was extremely costly. All of seL4 is 1/5 the size of jQuery and it's taken years of work. But it also includes functional verification, not just memory safety. In fact, it is among the largest programs ever functionally verified to that extent, and yet it was about 3 orders of magnitude smaller than oridnary business software, roughly the same verification gap we've had for decades. We don't yet know how to functionally verify software (end-to-end, like seL4) of any size that's not very small.

Anyway, Rust offers a much more limited form of assurance, and sound static analysis tools offer the same, and at a lower cost for existing codebases.

jfsszuj · on Oct 24, 2019

We can almost always tell what a given piece of software will do, we just can't tell what all software will do in all cases.

andrewflnr · on Oct 24, 2019

That's a pretty significant almost. Regardless, kragen's point was not about whether figuring it out was possible but how expensive it would be.

jfsszuj · on Oct 24, 2019

It's not that significant. We can tell what the vast majority existing software will do in an automated way. Compiling a program is the equivalent of encoding it's semantics in another language which implies knowing what it will do - at least that's one way of 'knowing what it will do'.

andrewflnr · on Oct 25, 2019

You can write down the physical laws that apply to a given system, but we don't usually call that "knowing what it will do", unless you can actually predict the state, or in the case of a program, the output. The mere fact that compilers exist is a meaningless form of "knowing what the program will do", only superficially relevant. You can't solve the halting problem with compilers in the same way that Newton's laws don't solve the three-body problem.

Also, did you catch the part where the point is about how expensive it is?

jfsszuj · on Oct 26, 2019

> but we don't usually call that "knowing what it will do", unless you can actually predict the state, or in the case of a program, the output.

Who is 'we'? And yes, we can predict exactly what the output of a given program for a given input is, for the vast majority of cases. All you have to do is run the program.

> The mere fact that compilers exist is a meaningless form of "knowing what the program will do", only superficially relevant.

You think static analysis, type checking, intermediate representation, optimization, the translation of the program with exact semantics into another language, etc. - is 'superficially relevant' to understanding a program?

> You can't solve the halting problem with compilers

Now that's pretty irrelevant.

> Also, did you catch the part where the point is about how expensive it is?

Did you catch the part where I was only commenting on a specific part of the comment? But tell me, how expensive is it?

kragen · on Oct 24, 2019

It would be hard to overstate how incorrect this statement is, if it is read with the implicit qualifier "for all possible inputs", without which my comment above would be obvious nonsense. Of course we can tell what most programs will do for some inputs—we can just run them!

jfsszuj · on Oct 24, 2019

Yup, we can tell what most existing software will do for all inputs. Rice's theorem states that we can't tell what all software will do, not that it's impossible to tell what a given piece of software will do.

kragen · on Oct 25, 2019

If this were the case in practice, most software would have no bugs.

jfsszuj · on Oct 25, 2019

The fact that we can determine what a piece of software will do, doesn't mean we always do that kind of analysis, or that the programmer fully understands his own code. That's why we have type systems, constraints, verification tools, etc.