Thanks! It was definitely a highlight working on such bleeding-edge technologies with crazy smart people. Pouring over custom PowerPC CPU Errata was divine.
Oh, and we only had (if I recall) 1ms per frame on one core to do all our payload packaging and dequeue messages from the circular buffer. Thats' where the 20k/s hard limit came in... we could have handled SO much more. Our entire message usually landed around 100-150 bytes if I recall, using bitpacked structures.
One thing I didn't anticipate: Memory stomping would result in everyone pointing fingers at our department, because we would inevitably be the ones that would crash (usually with our hardening asserts). We had to start flagging our memory blocks as unwritable when our thread was idle during debug mode, so that offenders would crash when they touched our memory.