Shenanigans With Hash Tables

pcwalton · on Sept 4, 2019

Objective-C uses a hash table cache for all method calls. It's a bit different in that Objective-C deals with collisions by just not handling them: since it's only a cache, collisions will result in going through the slow name lookup path.

Mike Ash has a nice series that goes through the disassembly of objc_msgSend: https://www.mikeash.com/pyblog/friday-qa-2017-06-30-dissecti...

jonathanstrange · on Sept 4, 2019

But aren't hash tables orders of magnitude slower than dynamic dispatch with jump tables? I'm just interested in PL implementation as a hobby, so correct me if I'm wrong, but this seems horribly inefficient to me.

tcard · on Sept 4, 2019

You can't use jump tables in Objective-C because methods can be added to objects at runtime.

masklinn · on Sept 4, 2019

That doesn't preclude using jump tables. You can also add methods to objects at runtime in JS and modern runtimes use jump tables.

I expect you can't use jump tables in objc because you can't really "patch in" jump tables at runtime as it's AOT-compiled rather than JIT-ed.

tcard · on Sept 5, 2019

Right, but that's no longer "just" jump tables, you need additional JIT logic for those tables to be correct.

Plus, in a JIT you can go even further and just do direct calls, no need for the indirection of tables.

jcelerier · on Sept 4, 2019

well, does macOS feel like a fast operating system to you ?

willio58 · on Sept 4, 2019

jcelerier · on Sept 4, 2019

you certainly haven't used Linux on a mac then. On the exact same machine it is incredibly more snappy.

perishabledave · on Sept 4, 2019

Mike Ash’s posts are gold. I wish there were more in depth technical posts like those on HN or Lobsters.

autoexec · on Sept 4, 2019

lobsters?

grzm · on Sept 4, 2019

https://lobste.rs, discussion-based aggregator site similar to HN.

autoexec · on Sept 4, 2019

more similar to HN than I expected. Thanks, I'll check it out!

saagarjha · on Sept 4, 2019

> Objective-C uses a hash table cache for all method calls.

Nit: method calls to objects that are not nil, since the latter causes objc_msgSend to bail out before the hash table is consulted.

layoutIfNeeded · on Sept 4, 2019

I recommend reading up on minimal perfect hashing: http://ilan.schnell-web.net/prog/perfect-hash/algo.html

chaboud · on Sept 4, 2019

This is what came to mind when I read this. Far too many professional programmers are unaware of perfect hash functions. A disturbing number are unaware of pathological collision cases.

For a complete set of methods known at compile time, a minimal perfect hash is, well, kinda perfect...

trishume · on Sept 4, 2019

In this case I did actually consider it but forgot to mention it in my post. The slightly more complex constraints of having multiple different tables using the same hash function make searching for the hash function more annoying so I didn’t bother.

nwellnhof · on Sept 4, 2019

Seems like the same idea as in this paper:

Efficient Implementation of Java Interfaces: Invokeinterface Considered Harmless (2001) https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.21....

rurban · on Sept 4, 2019

I think Self did the same. At those times this was one of the most researched topics, how to efficiently implement a PIC, polymorphic inline cache. Either with a simple ordered list (usually constructed at run-time, a harder problem), such a hash (good for compile-time) or vtables.

caf · on Sept 4, 2019

According to my compilers class professor there’s a broad literature of optimizing the “giant table with all signatures” approach with heuristics for saving space by rearranging and merging the tables of different classes into the same space or re-using offsets across classes to make tables smaller. But the general problem is NP-complete so can only be solved heuristically.

A SAT solver should actually be able to handle this fairly easily, because in practice, you don't get long complex chains of overlap between the interfaces implemented by common classes. If someone feeds your compiler an adversarial input you can just fall back to the naive implementation.

raja_mukherji · on Sept 4, 2019

I've been using a similar but flipped approach in my own interpreted language, creating a hash table for each method and then look for the object type as the key. This allows me to check the parent types if an exact object type is not found. I then nest the tables for multiple dispatch.

nutjob2 · on Sept 4, 2019

I think this is more common that the author assumes, I use hash tables in a similar way in my own project. It quickly suggests itself when you are mapping names within types to different offsets per type.

jasoneckert · on Sept 4, 2019

I'm quite experienced in shenanigans. Doing them with hash tables, however, definitely peaks my interest.

radiowave · on Sept 4, 2019

Please allow me to bring your whimsical humour back down to earth with some tedious pedantry: "piques", not "peaks".