Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I (author) agree that certainly at the higher end of element counts, it would be a very strange decision to not use an index. One of my favorite related papers is RadixSpline by Kipf, which shows an index based approach.

However, the inner loop of indexes usually is some other kind of search algorithm. Graefe mentions in BTree techniques that interpolation search is sometimes applicable to inner nodes in btrees. One of the SIP evaluations was using SIP as part of the inner loop in LevelDB, which doesn't displace the use of an LSM for managing the larger data set.

If you don't need a lot of range queries, eytzinger, or cache-line optimized btrees do get a lot more interesting, but the iteration logic gets a lot worse. SIP returns an element if it's present and a sentinel if absent, which is an odd choice since the index is usually what's needed. No clue, but hopefully you'll have some sympathy for revisiting your old code and wondering who wrote it. :)

I do think that optimizing search algorithms is still worthwhile in an indexed setting for two reasons: 1. The size of a node within an index like a btree depends on the efficiency with which that node can be searched, so if you are able to decrease the cost of searching a given node in the index, that might motivate larger node sizes. I know that a common heuristic is to use a multiple of a page size, but that is based on the assumption that the whole node will be used as part of the search. 2. Pushing algorithmic decisions to just in time can be more costly than the time saved from choosing a better algorithm. You lose icache efficiency, stall on the decision, etc. I think it's still good to steel man, as best you can, that your index is actually buying you improved performance for additional complexity or update overhead.



Your paper is a valid add. And I'm gonna read/digest it. My point above while true, is more in the margins. Better to keep the focus on engineering know-how in the PDF. Thank you for submitting it.


What's stopping eytzinger for being used for ranges? You can still iterate sequentially on an eytzinger tree, just like a binary search tree.


Because it's slower




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: