We seem to talk past each other. I have no doubt that signed integer overflows a...

wizeman · on Aug 28, 2023

> but typical compiler behaviors on UB are reasonably understood that I think its immediate effects are overstated

I don't dispute that usually the compiler and the compiled code behave like we expect. That is not under dispute.

What I am arguing is that sometimes, they don't behave like we expect. Instead, they behave in wildly different, unexpected, and unintuitive ways.

Since sometimes they don't behave like we expect, there is no way of knowing for sure what the effects of UB are without looking at the compiled binary code.

> So should we say that such code is also vulnerable or exploitable? It would be quite a stretch IMHO.

Sure, but in the case you are describing, the vulnerability is not in the safe code, because if the rest of the code was safe, then the safe code would not exacerbate anything.

In the OP's case, this is not true. The rest of curl could be 100% safe and yet, the bug described in the article could still be the cause of an exploitable security vulnerability.

And since the bug causes UB, there is no way to know for sure if the bug causes an exploitable security vulnerability or not just by looking at curl's source code, unlike what Daniel seems to be claiming.

> I expect Daniel Stenberg to actually write something akin to my comments when he dispute the CVE.

Sure. He should dispute the CVE and there is no reason to believe that the bug that he is describing has a security vulnerability.

But once again, his claim that the bug does not cause a security vulnerability is unsubstantiated.

The correct thing to say would be something like: "at this point, there is no reason to believe that the bug causes a security vulnerability".

He could even say that it's extremely unlikely for there to be one. But he cannot say for sure that there isn't one, just by looking at the source code.

The claims he is making contribute to proliferate misconceptions about the C language, C compilers and the security of C code, which unfortunately are too common.

> [1] E.g. https://marc.info/?l=llvm-dev&m=143589591927876&w=4

Thanks for the link!

I had never heard about Annex L.

I haven't read the Annex yet, but I suspect that actually achieving the goal of bounding the effects of UB is much, much harder than what it appears to be at first sight.

Right now, it is difficult or impossible to reason about the effects of UB in all cases because the tiniest assumption about the impossibility of a signed integer overflow can lead to an unbounded number of arbitrary, cascaded side effects.

Furthermore, I don't think this is something that can be easily fixed. The main reason for that is that exploiting these assumptions allows compilers to perform significant performance optimizations, and these performance optimizations are correlated with substantial transformations in the IR / compiled code, which sometimes leave the final compiled code unrecognizable compared to the source code.

If you'd prevent the compiler from performing just a single one of these optimizations, many significant real-world software projects and companies would immediately object to that, because it would have a significant performance impact in many important real-world code segments.

I'm talking about projects where a 5% performance degradation in some code segments can be considered a very significant regression.

In the end, the main way to stop UB from affecting compiled code in unexpected ways (in some cases, in extreme ways) is to actually define what happens on an integer overflow, or at least, leave it implementation-defined.

You cannot easily say "you can do whatever you want on integer overflows" while also saying "except the program may not crash just because of the overflow".

The latter is almost equivalent to saying "signed integer overflows must perform two's complement arithmetic", which is what Rust does.

Because any other behavior would basically be just as unintuitive as UB in some cases, and in fact, for you to preserve the existing optimizations, in the worst case the compiled code would have to behave differently depending on not only the surrounding source code, but also any arbitrary piece of code in the entire program (due to link-time optimizations).

The email you linked to even alludes to the Annex's naivety in terms of bounding the effects of UB:

> the Annex completely fails to be useful for the purpose you intend

> That really doesn't get you much of anything.

But really, all of this is completely beside the point. I was only trying to dispute a couple of Daniel's claims, which are too strongly-worded, without actually disagreeing with anything else that Daniel is saying.

I wouldn't even disagree with those claims if they were worded a bit less strongly.