Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

* How many numbers are you looking for?

* How many times will you look for each number?

* How big is the original data?

* How sorted is the original data?

* What kind of access patterns do you have for lookups? (i.e. are the lookups themselves sorted)

* Are there perhaps special space constraints to consider that may have priority over time constraints?

* Where/how is the data stored/how is it backed? (perhaps you're reading from tape media or other non-random-access-optimized storage)

* Is the data a set or bag? (duplicate values allowed, do you care which one is returned, etc)



Probably a quicker way to get a developer to generate this same list of considerations is to say "explain the tradeoffs in adding an index to a column in a database table." We tend to search large tables much more frequently than large in-memory lists these days, so that's where any discussions you'll find on the topic are focused.


Dumb/honest question from someone who isn't as experienced with database mechanics as he'd like to be:

What exactly are the tradeoffs of adding an index?


Storage size of the index itself, insert/update performance. Also there may be very little gain from adding an index to a column with only few distinct values or a very skewed distribution (partitions may be a better approach in those cases).


you have to build it and store, so the footprint of this data structure has now increased.

If the array of numbers is mutable, the index also needs to be a modifiable structure (like a tree).


* Is writing much slower than reading (although I guess you sort-of covered that with your penultimate point)?


* cardinality of the array. Maybe there are a million entires, but only 3 distinct values. keeping an unsorted set of distinct values might, in this case, be a much better solution that a sorted index. No mention was made of data cardinality.


* Are the contents going to change?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: