Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One major advantage of MySQL's clustered indexes the article doesn't mention is that, although secondary key reads may be a little slower, primary key reads will be faster. The row data lives in the primary key index, so there is no need for referencing an additional database page (possibly causing random I/O).

This is especially relevant when doing range queries over the primary key. Imagine a table containing billions of chat messages, from which you want to retrieve a single conversation history. With a clustered primary key on (conversation id, message id), MySQL would need to process just a couple of database pages. Postgres, on the other hand, would need to reference a semi-random page for each of the messages.

Now imagine a 10k message chat conversation, a table too large to fit into RAM, and storage by means of spinning rust (yeah, yeah, I know what year it is :-)). The difference would be somewhere between 2 and 3 orders of magnitude.



Uhh ... Postgres supports Index-only scans; as long as the data you're asking for is in the index, that is.

So if you have an index on (conversation_id, message_id), and you try to retrieve message ids of a specific conversation, only the index will be touched.


Doh, you're right of course! We were having this problem in the Postgres 9.1 era, before index-only scans were a thing.

Still, it's quite inefficient maintaining an extra copy of the data that is never actually used. Though no longer multiple orders of magnitude less efficient.

However, I'd guess that programmers don't often think to add these seemingly useless fields to an index, as it feels inefficient and just wrong. But at least this offers an out in pathetic cases.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: