Multi-key quicksort

Multi-key quicksort, also known as three-way radix quicksort, is an algorithm for sorting strings. This hybrid of quicksort and radix sort was originally suggested by P. Shackleton, as reported in one of C. A. R. Hoare's seminal papers on quicksort; its modern incarnation was developed by Jon Bentley and Robert Sedgewick in the mid-1990s. The algorithm is designed to exploit the property that in many problems, strings tend to have shared prefixes.

One of the algorithm's uses is the construction of suffix arrays, for which it was one of the fastest algorithms as of 2004.

Description

The three-way radix quicksort algorithm sorts an array of N (pointers to) strings in lexicographic order. It is assumed that all strings are of equal length K; if the strings are of varying length, they must be padded with extra elements that are less than any element in the strings. The pseudocode for the algorithm is then

algorithm sort(a : array of string, d : integer) is
    if length(a) ≤ 1 or d ≥ K then
        return
    p := pivot(a, d)
    i, j := partition(a, d, p)   (Note a simultaneous assignment of two variables.)
    sort(a[0:i), d)
    sort(a[i:j), d + 1)
    sort(a[j:length(a)), d)

Unlike most string sorting algorithms that look at many bytes in a string to decide if a string is less than, the same as, or equal to some other string; and then turning its focus to some other pair of strings, the multi-key quicksort initially looks at only one byte of every string in the array, byte d, initially the first byte of every string. The recursive call uses a new value of d and passes a subarray where every string in the subarray has exactly the same initial part -- the characters before character d.

The

pivot function must return a single character. Bentley and Sedgewick suggest either picking the median ofa[0][d], ..., a[length(a)−1][d] or some random character in that range. The partition function is a variant of the one used in ordinary three-way quicksort: it rearrangesa so that all ofa[0], ..., a[i−1] have an element at positiond that is less thanp,a[i], ..., a[j−1] have p at position d, and strings from j onward have ad'th element larger than p. (The original partitioning function suggested by Bentley and Sedgewick may be slow in the case of repeated elements; a Dutch national flag partitioning can be used to alleviate this.)

Practical implementations of multi-key quicksort can benefit from the same optimizations typically applied to quicksort: median-of-three pivoting, switching to insertion sort for small arrays, etc.

See also

  • American flag sort – another radix sort variant that is fast for string sorting
  • Ternary search tree – three-way radix quicksort is isomorphic to this data structure in the same way that quicksort is isomorphic to binary search trees

Notes

References

Uses material from the Wikipedia article Multi-key quicksort, released under the CC BY-SA 4.0 license.