Especially when targeting realistic machine models, there are a lot of things that are suboptimal about the classical sorting algorithms like quicksort or mergesort. For example, a quicksort with perfect choice of pivot will incur a branch miss with probability 50% for every element. That's not something classical complexity analysis measures, but on actual CPUs, branch misses have quite an impact (something like 10–15 cycles). Shameless plug for something I once built for a lecture to demonstrate this: https://github.com/lorenzhs/quicksort-pivot-imbalance
Of course these algorithms are much more complex and error-prone to implement and use some additional memory, which may explain why they're not used in standard library implementations of popular languages.
Thanks, I totally forgot about pdqsort! If I recall correctly, it's partially based on blockquicksort, but includes a lot of other improvements. It's a very interesting algorithm, much faster than introsort etc. However, I believe that IPS4o is still faster overall (I think I asked Sascha, one of the authors of IPS4o, about this a while ago, but I don't have any measurements at hand), even before accounting for IPS4o's parallel implementation. You should definitely check out both :)
I think I need to swear off of involving myself in order of complexity conversations and just get back to doing useful work. "The difference between theory and practice is that in theory there is no difference." and as processors evolve that just becomes more and more true for complexity calculations.
It's still a useful and helpful mental exercise but what matters is if I can do an operation cheaper and more reliably than you can. Order of complexity informs that decision but doesn't define it.
Even a moderately sized C is proportional to log(n) for quite a lot of data sets most of us actually work with. Conversely, adding, subtracting or comparing two numbers of arbitrary precision (eg, bignum) takes log(n) time, not O(1) time. Very few algorithms we call < O(n) are implementable in less than O(n log n) as n -> ∞
Complexity analysis is step 2. Step 1 being admitting you have a bottleneck. But there are a lot of other steps after those, with a lot of challenging, specialized work.
It all depends on your machine model. You're right in that the RAM model doesn't model the performance characteristics of real-world computers very precisely. That's why it's a model :) Other models exist that can be used to analyse various aspects of algorithms, e.g. the External Memory model (which can be applied to any level of the memory hierarchy, e.g. to quantify data transfer between cache and RAM). Or you can count the number of hard-to-predict branches, etc.
We almost always assume that a machine word is large enough to describe the input size n. This usually implies constant-time operations on log(n) bits. Not doing so would clutter up notation and complicate analysis, and the whole point of using models is avoiding that where reasonably possible.
I mean you could take this thinking to the extreme and say that as we live in three-dimensional space, the wire length to access more and more memory has to grow at least with the cube root of the size of the memory, because that memory needs to physically be stored somewhere at the end of the day. That's not helpful for analysing algorithms, though :)
But for instance in a world where horizontal scalability is almost a given, we have to allow that the cost of sending a message between 2 arbitrary servers is going to be at least log(n) time, because if you build a star topology your entire server room would be wires.
So if you're doing a distributed hash, the base cost of fetching two values to compare them is not only nothing to sneeze at, it is probably fundamental to how you solve the problem.
Now you’re in some unspecified distributed model without even defining your variables. Again, models are there to allow us to reason about things without going into every detail. It might make more sense to treat the time to open a connection and the time to send one machine word over an established connection as variables in asymptotic analysis. Then you can reason about sequential processing time, communication volume, and latency in one term.
I'm a PhD student in algorithmics / algorithm engineering, so I work with a lot of people who do stuff like this, even though my research isn't related to sorting. Super Scalar Sample Sort (which ips4o is based on) was co-authored by my advisor, Peter Sanders, and I re-implemented it a few years ago in modern C++ just for the fun of it. Turns out that was a lot nicer to read than the original code (which isn't public) and a bit faster, too. I put that on GitHub where the blockquicksort authors found it and contacted me and we traded some emails and made some improvements. Sometime later my advisor and two colleagues came up with an idea for in-place super-scalar sample sort, out of which eventually ips4o was born.
So, uh, I guess I just work with the right people :) Sorry that I can't give you anything concrete. All of these papers were presented at ESA (European Symposium on Algorithms), though, so that's a good venue to follow. But beware, ESA has a theory track that's a lot bigger than the experimental track, and papers published there can be somewhat unapproachable ;)
I'd recently asked elsewhere 'I've always wondered is there a good mathematical representation of an algorithm that is useful for algebraic manipulation? Like producing a quicksort from bubble sort.' And got linked this paper [0]. This is only time I've heard the word 'Algoritmics' since that.
Any interesting 'entry level' reads you could send my way?
Thanks for the link, that was fun to read (well, skim).
I don't think the term 'algorithmics' appears very often in publications, it's more of an umbrella term for many things. The stuff we work in our group is sort of the practical side of theoretical computer science, in that our focus is on algorithms that can be efficiently implemented and don't just look good on paper. The methodology is called Algorithm Engineering, it's described quite well in https://en.wikipedia.org/wiki/Algorithm_engineering. Apart from ESA's track B, the major conferences there are Alenex and SEA (Symposium on Experimental Algorithms). All three are open access.
It's difficult to recommend anything in particular, not only because the scope is very broad, but also because most papers aren't written for a wide audience. Good writing is not something that academics optimize for (perversely, good writing can be seen as a negative point, in that if a paper is easy to understand, it may be rejected for being too simple), nor is it taught. Maybe something like https://github.com/papers-we-love/papers-we-love could serve as a starting point?
For people who aren't Ph.D. students, it may help to contact a local CS department and see what they have going on. There should at least be a grad student colloquium meeting once a week or so. I attended that at Penn a few times, although it was too far above me to really follow (lots of PL theory). More recently I've been going to a Database Reading Group at Portland State University (http://datalab.cs.pdx.edu/dbrg/index.php), which is a great way to get a taste of current research in that niche. Databases are less rarefied, too. :-)
Example of the crazy optimizations that apply here: I once got a 20% speedup in a standard recursive search algorithm by abandoning the native function call stack and making my own search-specific stack data structure. Since these algorithms call the same function over and over, making the call stack do minimum work is actually significant.
TimSort is an immensely complex sorting algorithm, full of magic numbers and difficult to analyse formally (e.g. the invariants it preserves are multi-line disjunctions). I'm not surprised there are still bugs to be found in it. I do think that issues like this make the case for using simpler, clearer algorithms even at a slight performance penalty.