Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Th64: Tiny Hash Function in C (github.com/tidwall)
53 points by ingve on July 4, 2024 | hide | past | favorite | 22 comments


Hi, MurmurHash and SMHasher author here.

With the code-golfing obfuscation unwound, Th64 is structured very similarly to MurmurHash. The source data is pre-mixed via "rol(x * r + i, 31)", the hash is mixed via "hash = rol(hash * r ^ x, 31)", the tail bytes are mixed via "hash = hash * r ^ x", and the finalization mix is three "hash = (hash ^ (hash >> 31)) * r" rounds.

I see nothing wrong with it if it passes SMHasher3, but it's also not radically different than other hashes - it's just smooshed down into four lines.


>>> Hi, MurmurHash and SMHasher author here.

Only in hacker news


What did you expect?


On Monday I've got a fantastic hashing bug to show you.


Ooooo.....


Hi HN. I’m the author of this hash. It’s just something I whipped up last night. It’s really not much different than the other hashes out there. I wanted to see if I could make a simple quality hash that easy to cut & paste, under 80 chars lines, and self contained.

Otherwise just an experiment.


This is cool, but note:

* ~~It requires two passes over the data to hash (notice the two while loops).~~ Actually it doesn't.

* The first loop casts the data pointer to a 64-bit int, so you will get different results depending on endianness. I don't remember if this is undefined behavior or not.


It's not the endianness that's undefined, it's the dereferencing. 75% chance of this causing an alignment trap on my current development target.


And even on platforms that allow unaligned access, isn't there usually a performance penalty when the access straddles two cache lines? Still best avoided if possible.


I don't often deal with CPU-coupled caches, but I could see a system where that happens.

Cortex-M4, which doesn't have integrated caches, breaks each 32-bit load into 1-3 aligned loads of between 8 and 32 bits according to the address %4. Cortex-M7 performs each 32-bit load as either 1x or 2x 32-bit aligned loads depending on address % 4.

I agree - especially in cases like this algorithm, where there is only one memory stream, it's often worth unrolling up to 3 or 7 accesses before the big aligned loop.


I could be wrong, but I think it just goes over the data once. The first loop increments i by 8, so is processing 64 bits at a time. It looks like the second loop only does the remaining few bytes, if any.


Right. To make it work you need to copy the first unaligned bytes, and then loop over the aligned parts.


You're right. I missed that in my original readthrough.


Is the obfuscated-C style really necessary? Last I checked, compacting C source did not directly result in tighter object code.


It's not just unnecessary, it's wrong since it casts a potentially-unaligned pointer, instead of just using `memcpy`.

It will happen to work in some common cases if the input is exactly the output of `malloc`, or the same after certain kinds of fixed headers, but will fail for many other buffers.


This is super cool. Any insight into how this may have been designed? High-quality, compact hash functions are really useful in C.


Slightly off topic: Does anyone know of a good/fast hash function (similar to this) for use in 32bit embedded systems (such as Cortex-M, TriCore, C2000, etc)?


why not just use fnv64


I get that expressing the function in 6 lines is cool and all, but it'd be nice if it was readable.



Great! Now do it in brainfuck!


Neat, but I'm personally missing some key info here - most importantly "why?". What problem does this solve or why is this particular function good or better than other "standard" hashing functions?

And I have to admin that this kind of obfuscation/minification is especially dangerous in languages such as C as there are quite a lot of footguns already as it is. I don't think that minifying C produces faster programs :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: