> *[...] I'm not even sure that it's possible to write a correct lock-free queue...

0xffff2 · on June 4, 2019

After reading the full article, they solve a vastly more restricted (although perhaps still useful) problem than the title implies. Their design assumes a single producer and single consumer and is trivially broken if that's not the case. I don't have time to write a convincing set of benchmarks to prove it, but I very much doubt that this is faster than a locking implementation of the same structure.

physguy1123 · on June 5, 2019

I can attest to the basic lock-free spsc queues being far faster than a basic spinlocking version of the same queue, on xeon-class X86 at least.

LOCK'ed instructions (one might use lock xchg or lock cmpxchg) perform a full memory barrier, which has quite a few performance implications if there are any outbound loads/stores. Further, if the cache line containing the spinlock is in anything except an exclusively-held state on the writing core (it will usually be on a different core or in a shared state), simply acquiring the lock will stall for ~25-30ns+.

On the other hand, a simply spsc lock-free queue can hit <10ns per message with a latency about equal to the core-core communication latency.