Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can get better performance in the one writer thread / one reader thread case by using simple lock-free circular buffers. No special instructions are needed for this. Cache line thrashing can be minimized by allocating the pointers into four cache lines: one writer local, one for writer-to-read communications, one reader local, and one for reader-to-writer communications.

You do need barriers: issue a write barrier before giving a copy of the write pointer to the reader. Issue a read barrier before reading the data.

Avoid cache line thrashing in a mostly full case by never allowing the circular buffer to become completely full (leave at least one cache line free).

The reader thread can quickly poll many circular buffers for work. The pointers will all reside in the reader's cache until someone has written some data (and updates the reader's copy of the write pointer).

You can get more benefits by delaying the update of the other thread's pointers: on the writer's side, until we have no more data available to write. On the reader's side, until we have read all the data (or some larger chunk of it). This allows the cache-line prefetching hardware to work (to prefetch likely used data).

Anyway, if you really want to use a linked list, at the very least allocate multiple items per cache line, and then link the cache lines together (so one link pointer or less per cache line).



If you want to see an implementation along the lines that jhallenworld describes Martin Thomson has some very high quality implementations here in Java. These are just single producer, single consumer queues

https://github.com/mjpt777/examples/tree/master/src/java/uk/...

I have a number of Go implementations of the same queues here

https://github.com/fmstephe/queues




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: