Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's true, but it's also worth pointing out that the worst case can be trivially prevented by shuffling the array before you Quicksort it. Which is maybe a confirmation of your larger point.


The principled way to avoid bad worst-case performance for quicksort is to select the pivot using a median of medians approach. That gives you good asymptotic performance even for adversarial inputs, but in practice it tends to be slower. Which fits well with the theme of this article.


You can trivially achieve bounded worst case performance with minimal overhead by doing median of medians only, say, one time in 5.

In the average case you do the expensive median of medians on a small fraction of the data and so only pay a few percent average penalty. And in the worst case you still get n log(n) performance.


How is the worst case not n^2 on a case you don't use medians of medians?


Every fifth pivot is chosen to be an actual median.


The median-of-medians approach is also somewhat tricky to implement (avoiding array copies and so on), although it admittedly looks neat on the slides of the algorithms lectures.


Strictly speaking, it is not true that shuffling first avoids the worst case. Given a deterministic shuffling function, some permutation of input needs to shuffle to sorted order, which will again trigger the worst case.

Of course, shuffling is still potentially helpful because ascending/descending order is often common in many applications.


Randomization of the pivots produces O(n log n) average running time with a very low probability of any other running time for any value of n where the worst case running time matters.


Even without randomization of pivots expected run time is O(n log n). If the order of the input data is random, I don't believe randomizing pivots changes the distribution of running times.

What changes is that one very common permutation of data (ascending data) does not have O(n * 2) performance.


There's little justification for expecting data to behave randomly unless our algorithm introduces it.


Correct. In general, real data will seldom, if ever, look like random data.


Shuffling with a too-small seed size will render the shuffle susceptible to brute force attacks; but in no way invalidates its use as a dodge for pathological (pre-sorted) arrays submitted to Quicksort in the course of normal usage. There's a better chance of getting eaten by a dinosaur than finding O(N^2) QS performance on shuffled data that comes from anything other than an attacking adversary.


You're right that shuffling doesn't avoid the worst case, but your explanation isn't quite correct. See my other comment for my explanation.

I'm not sure what you mean by deterministic shuffling. Shuffling is supposed to be random, and therefore non-deterministic. In practice, you'd probably use a pseudorandom function to implement shuffling, but the goal of pseudorandom functions is to be indistinguishable from random functions, so that detail shouldn't matter.


Shuffling can never prevent the worst case of any algorithm. "Worst case" means the worst out of all possible cases. Shuffling doesn't change the set of cases, and it doesn't change anything about any individual case.

Probably what you meant was that shuffling makes the worst case very unlikely.

On a practical level, if you have to shuffle the data before sorting to make your sort algorithm work better, you've most likely already blown any performance advantage. So you might as well choose a different sort algorithm.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: