Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

These things originate in hardware variations. Apparently all architectures use binary the same way so unsigned overflow just drops the MSB. But signed integers aren't always two's complement, so there is variation.

See, no tedious rules to remember, you just have to understand how computers work. But the C standards call such things "undefined behaviour" rather than "platform specific" behaviour, and then try to pretend that the compiler can abstract the underlying machine away. That is they pretend programmers can understand the standard, and don't need to know how computers work.

The result is a maze of arbitrary - but historically rational - rules about when the compiler has to do something sane, and when it is allowed to do whatever the hell it likes to squeeze some micro-improvement from a benchmark.



> The result is a maze of arbitrary - but historically rational - rules about when the compiler has to do something sane, and when it is allowed to do whatever the hell it likes to squeeze some micro-improvement from a benchmark.

Which is a nice way of saying that sometimes they'll decide to elide chunks of code because they would only be reachable because of undefined behavior, and if that elided code happened to specifically be checking for and handling that undefined case as an error, too bad.[1] :/

1: https://news.ycombinator.com/item?id=14163111


The C standard actually has many alternatives to undefined behavior: implementation-defined behavior like the propagation of the high-order bit when a signed integer is shifted right, unspecified behavior like the order in which the arguments to a function are evaluated. Related to those are the implementation-defined, unspecified and indeterminate (unspecified or trap) values.

Now unspecified values are tricky again. They can propagate their unspecifiedness and can be different each time you look at them. x == x can be both true and false and is actually unspecified again. If your compiler is using this, things can get insane. I think here the definitions of the C standard would need to be changed a bit.

Also I don't see why signed integer overflow cannot be implementation-defined behavior.



Unless you are paid by the line that should either work as "expected" (i.e. wrap) or produce an error about a meaningless comparison.

Is it worth a few developer days worth of work to track down a hard to repro bug that only happens with hard to debug optimizations enabled?


No, which is one reason why I almost always use unsigned types in my C code, particularly in the context of data structure management where negative values are unnecessary and usually non-sensical.

GCC supports -fwrapv and -fno-strict-overflow; and I think clang supports both, too. I've never cared to use them because I only rarely use signed types. But some projects and programmers use those options habitually.

AFAIU, Rust panics by default on signed overflow. And even if it wraps, that's not unequivocally better. Unlike with enforced buffer boundary constraints, neither is clearly better than what C does. Arithmetic overflow is a common and serious issue in just about every language. Short of a compile-time constraint or diagnostic that triggers if the compiler cannot prove overflow is either explicitly checked or benign (that is, a negative number is no worse than a large positive number in the context of how the value is used), there's no obvious solution that really forecloses most exploit opportunities across the board.

Because so much code, regardless of language, has some unchecked signed integer overflow bug, if you panic you make it easy to DoS an application. And a DoS can sometimes turn into an exploit when you're dealing with cooperating processes. For example, you occasionally see bugs where an authentication routine fails open instead of failing closed when the authenticator is unreachable.

If you silently wrap signed overflow, all of a sudden the value is in a set (negative numbers) that might be completely unexpected. Even in so-called memory safe languages negative indices can leak sensitive information or erroneously select privileged state. For example, in some languages -1 selects the last element of an array. You can check for negative values explicitly, but multiplicative overflow can wrap around to positive numbers, which is no better than using an unsigned type; a check for a negative values is typically redundant work which adds unnecessary complexity--and unnecessary opportunity for mistakes--relative to sticking to unsigned types.

IMO, signed overflow is the worst option. I just don't see the point. The only three options I like for avoiding arithmetic overflow bugs, depending on language and context, are

1) Check for overflow explicitly (independently from array boundary constraints) and bubble up an error;

2) Carefully rely on unsigned modulo arithmetic;

3) Carefully rely on saturation arithmetic.

IMO the C standard's fault isn't in its refusal to make signed overflow defined or implementation-defined, but in providing neither a standard API for overflow detection, a construct for saturation semantics of integer types, nor a compilation mode to warn about unchecked signed overflow (e.g. something at least as useful as -Wno-strict-overflow in GCC).

Fortunately both GCC and clang have agreed on a standard API for overflow detection. That's something. But unfortunately it'll be years before you can consistently rely on those APIs without worrying about backward compatibility.


> AFAIU, Rust panics by default on signed overflow

Overflow of any integer type is considered a "program error", not undefined behavior. In debug builds, this is required to panic. In builds where it doesn't panic, it's well-defined as two's compliment wrapping.

You can also request explicit wrapping, saturating, etc behavior.


Thinking about hardware is definitely the move when writing C.

This is the major struggle with abstraction. We want to remove the burden of knowing the ins and outs of the target architecture. Inevitably, we create trouble and fall on our faces when it turns out that the hardware is still in fact there and doesn't like when we ignore it.

It's really an impossible problem. One can't account for every architecture when designing a language. Likewise, one can't feasibly remember the details of every architecture while programming. Honestly I'd be interested to see some tools that approach the problem from a direction other than maximum portability. Not that I think they'd be popular or "good".


Easy check the family of Algol, Xerox PARC, Wirth languages.

Where safety is more relevant than maximum portability.

Everything that isn't really portable is marked as explicit language extension or unsafe construct.

One might complain that it leads to language dialects, but the same is true for C, where certain semantics depend on the compiler and even change between versions.


At least one extant (or recently extant) system has to emulate unsigned, modulo arithmetic. This can be handled by the C compiler transparently, however. From the C compiler documentation:

  | Type          | Bits | sizeof | Range                                  |
  +---------------+------+--------+----------------------------------------+
  | ...                                                                    |
  | unsigned long | 36   | 4      | 0 to (2^36)-2 (see the following note) |

  ...
  Note: If the CONFORMANCE/TWOSARITH or CONFORMANCE/FULL
  compiler keywords are used, the range will be 0 to (2^36)-1.
  See the C Compiler Programming Reference Manual Volume 2 for
  more information.

  -- Section 4.5. Size and Range of C Variables of the Unisys
  C Compiler Programming Reference Manual Volume 1.
  https://public.support.unisys.com/2200/docs/cp16.0/pdf/78310422-012.pdf


A range of 0 to (2^36)-2 implies that there's one bit combination not mentioned here (that range has only 2 ^ 36 - 1 values; 36 bits can store 2 ^ 36). What's the last combination used for?


I don't know off-hand. AFAIU the Unisys machines use ones' complement representation. My guess is that the native unsigned set of values includes the representation for both positive and negative 0. Or there could be a trap representation that is hidden in unsigned mode, which presumably would also make these machines examples of hardware that traps on signed overflow.


Wow. 9-bit words with 1's-complement arithmetic.

I wonder if it still does end-around carry...


I wonder how many processors there are in use these days that don't use 2s complement. I don't think I've ever seen one.


Certainly not general CPUs, but there are probably domain specific processors out there that use something else. Why would you want to design a domain specific processor and still use C? Beats me.


Even if you don't really want C on such a process there will be an emergent and unholy aliance between (1) a pointy-hair impulse within the manufucturer to have "programmable in C" on the feature list and (2) an empire-building impulse within the C standards writing ecosystem that wants to encompass every chip under the sun.


Those forces are so strong people are still pushing for C on FPGAs.


The bigger reason it's still undefined is to enable this type of optimization: https://news.ycombinator.com/item?id=14857316


The comment thread seems to suggest that even if you define the behavior you can still optimize that case, and in fact Clang does.


> you just have to understand how computers work.

rude.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: