That's true, but it's also worth pointing out that the worst case can be trivial...

greatzebu · on July 6, 2014

The principled way to avoid bad worst-case performance for quicksort is to select the pivot using a median of medians approach. That gives you good asymptotic performance even for adversarial inputs, but in practice it tends to be slower. Which fits well with the theme of this article.

btilly · on July 6, 2014

You can trivially achieve bounded worst case performance with minimal overhead by doing median of medians only, say, one time in 5.

In the average case you do the expensive median of medians on a small fraction of the data and so only pay a few percent average penalty. And in the worst case you still get n log(n) performance.

judk · on July 6, 2014

How is the worst case not n^2 on a case you don't use medians of medians?

adrianN · on July 7, 2014

Every fifth pivot is chosen to be an actual median.

ballpoint · on July 6, 2014

The median-of-medians approach is also somewhat tricky to implement (avoiding array copies and so on), although it admittedly looks neat on the slides of the algorithms lectures.

ruggeri · on July 6, 2014

Strictly speaking, it is not true that shuffling first avoids the worst case. Given a deterministic shuffling function, some permutation of input needs to shuffle to sorted order, which will again trigger the worst case.

Of course, shuffling is still potentially helpful because ascending/descending order is often common in many applications.

brudgers · on July 6, 2014

Randomization of the pivots produces O(n log n) average running time with a very low probability of any other running time for any value of n where the worst case running time matters.

ruggeri · on July 6, 2014

Even without randomization of pivots expected run time is O(n log n). If the order of the input data is random, I don't believe randomizing pivots changes the distribution of running times.

What changes is that one very common permutation of data (ascending data) does not have O(n * 2) performance.

brudgers · on July 6, 2014

There's little justification for expecting data to behave randomly unless our algorithm introduces it.

mtdewcmu · on July 7, 2014

Correct. In general, real data will seldom, if ever, look like random data.

shanusmagnus · on July 6, 2014

Shuffling with a too-small seed size will render the shuffle susceptible to brute force attacks; but in no way invalidates its use as a dodge for pathological (pre-sorted) arrays submitted to Quicksort in the course of normal usage. There's a better chance of getting eaten by a dinosaur than finding O(N^2) QS performance on shuffled data that comes from anything other than an attacking adversary.

mtdewcmu · on July 7, 2014

You're right that shuffling doesn't avoid the worst case, but your explanation isn't quite correct. See my other comment for my explanation.

I'm not sure what you mean by deterministic shuffling. Shuffling is supposed to be random, and therefore non-deterministic. In practice, you'd probably use a pseudorandom function to implement shuffling, but the goal of pseudorandom functions is to be indistinguishable from random functions, so that detail shouldn't matter.

mtdewcmu · on July 7, 2014

Shuffling can never prevent the worst case of any algorithm. "Worst case" means the worst out of all possible cases. Shuffling doesn't change the set of cases, and it doesn't change anything about any individual case.

Probably what you meant was that shuffling makes the worst case very unlikely.

On a practical level, if you have to shuffle the data before sorting to make your sort algorithm work better, you've most likely already blown any performance advantage. So you might as well choose a different sort algorithm.