Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> nothing in the standard permits floating point types to be used this way. NaN boxing fundamentally relies upon undefined behavior.

What do you mean? memcpy from bytes to a NaN should work fine, and there's also a "nan" function that can take a number to insert into the NaN payload.

There are certainly implementation-defined aspects, but all of floating point is implementation defined. As long as you can get a number back out, you can make it into an integer and cast back to a pointer on 98% of platforms.



> memcpy from bytes to a NaN should work fine

Signaling NaNs are explicitly undefined in C11 F.2.1.: "This specification does not define the behavior of signaling NaNs." - and in practice may be "quieted" by conversion to Quiet NaNs, changing their bit patterns, or may trigger actual signals. Fast math optimization flags will also break the hell out of your code by assuming NaNs are impossible. I want to say there are more circumstances where optimizers and compiler generated code can butcher your NaN payloads, but I'd be working off recollected hearsay and I can't find a source, so don't quote me on that.

NaN boxing is common enough that, if you take the right precautions, a modern compiler should probably support it, maybe. NaN boxing is uncommon enough that, if your codebase needs to be sufficiently portable, you need an opt out for when it breaks. Let's review duktape's scars:

https://github.com/svaarala/duktape/blob/123d9426d5e5b36d5da...

https://github.com/svaarala/duktape/blob/5252b7a50611a3cb8bf...

https://github.com/svaarala/duktape/blob/224a0b89ca08a36e37e...

(for context, a "packed" duk_tval involves NaN boxing)

Note that "the right precautions" involve unions and proper integer types to avoid optimizer-invoked rewrites of the value and debugging when things go wrong, not simply YOLOing bytes into a double via memcpy. Note that debugging when it all goes terribly wrong can be quite painful. I've personally had the misfortune of being forced to debug duktape being built with fast math optimizatoins enabled on one "rare" platform + build configuration that wasn't caught by duktape's #if defined(__FAST_MATH__) checks linked above (wasn't Clang nor GCC, so go figure it didn't make the same #define)


Well half of NaNs are quiet, so that's easy to deal with. And the fast optimization flags are themselves violating the standard so those don't count.

> Note that "the right precautions" involve unions and proper integer types to avoid optimizer-invoked rewrites of the value and debugging when things go wrong, not simply YOLOing bytes into a double via memcpy.

Using a union is iffy behavior that has been argued about in the past and is even less safe in C++.

I suggested memcpy because it's the safe way. It's the opposite of YOLO. You memcpy from one type into a char buffer, then memcpy that char buffer into the other type.


> Well half of NaNs are quiet, so that's easy to deal with

Which half varies by architecture (I'm looking at you, MIPS - and apparently RISC-V at one point was going to go the MIPS route with an all-1s payload for canonical qnans?) - so platform specific spaghet and requirements testing is in the mix.

> And the fast optimization flags are themselves violating the standard so those don't count.

Some jerk will enable them, standards-violating or not. While it's valid for the solution to disable the optimization, in practice you will need to debug and write defensive code when this happens. I know of at least one platform which enables such optimizations by default for it's "release" builds, and while I'm angry at them for doing so, I'm unfortunately relegated to existing in the same reality as them.

Perhaps you're lucky enough to exist in a different reality?

> I suggested memcpy because it's the safe way.

More generally, yes, but in the specific context of preserving a NaN payload I would not trust the optimizer to keep my NaN payloads untouched when stored as a floating point value. LLVM developers appear to agree that NaN payload preservation is not guaranteed - I guess you can quote me on my earlier "optimizers can butcher your NaN payloads", presumably even without fast math optimizations:

https://lists.llvm.org/pipermail/llvm-dev/2018-November/1276...

Which causes a good bit of awkwardness for Rust:

https://github.com/rust-lang/rust/issues/73328

The solution is to not attempt to store payload-laden NaN as any kind of floating point value, even via memcpy. A union is acceptable in the sense that at least, then, you're supposedly storing an integer, and the bit pattern of that would be preserved. A memcpy to a temporary float immediately before NaN testing / floating point usage - and never back to integer in a naieve attempt to extract the possibly discarded payload - would work, but is a hell of a caveat to omit when saying "memcpy from bytes to a NaN should work fine", especially when mentioning `nan` with it's payload argument, which is unextractable without doing the naieve "back to integer" extraction which, if the above is to be believed, is unreliable at best.


> Perhaps you're lucky enough to exist in a different reality?

Any code I've touched that was going to deal with boxing had enough control over the build system to avoid that.

> More generally, yes, but in the specific context of preserving a NaN payload I would not trust the optimizer to keep my NaN payloads untouched when stored as a floating point value. LLVM developers appear to agree that NaN payload preservation is not guaranteed - I guess you can quote me on my earlier "optimizers can butcher your NaN payloads", presumably even without fast math optimizations:

That looks like it only butchers the NaN if you do math on it or try to have compile-time NaNs, which shouldn't be an issue here? Are there other factors making it worse that I'm missing in a quick read?

> A union is acceptable in the sense that at least, then, you're supposedly storing an integer, and the bit pattern of that would be preserved.

Oh, you're suggesting a union as permanent storage, not just to perform the cast. That makes sense.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: