The cache-oblivious distribution sort is a comparison-based sorting algorithm. It was introduced in 1999 in the context of the cache oblivious model. In the external memory model, the number of memory transfers it needs to perform a sort of
Contents
Basic Overview
Distribution sort operates on a contiguous array of
- Partition the array into
N N - Distribute the elements of the sorted subarrays into
q ≤ N B 1 , B 2 , … , B q 2 N B i B i + 1 . This distribution step is the main step of this algorithm, and is covered in more detail below. - Recursively sort each bucket.
- Output the concatenation of the buckets.
Distribution Step
As mentioned in step 2 above, the goal of the distribution step is to distribute the sorted subarrays into q buckets
Initially, the algorithm starts with one empty bucket with pivot
To accomplish this, each subarray and bucket will have a state associated with it. The state of a subarray consists of an index next of the next element to be read from the subarray, and a bucket number bnum indicating which bucket index the element should be copied to. By convention,
Consider the follow basic strategy: iterate through each subarray, attempting to copy over its element at position next. If the element is smaller than the pivot of bucket bnum, then place it in that bucket, possibly incurring a bucket split. Otherwise, increment bnum until a bucket whose pivot is large enough is found. Though this correctly distributes all elements, it does not exhibit a good cache performance.
Instead, the distribution step is performed in a recursive divide-and-conquer. The step will be performed as a call to the function distribute, which takes three parameters i, j, and m. distribute(i,j,m) will distribute elements from the i-th through (i+m-1)-th subarrays into buckets, starting from
The base case, where m=1, has a call to the subroutine copy_elems. In this base case, all elements from subarray i that belong to bucket j are added at once. If this leads to bucket j having too many elements, it splits the bucket with the procedure described beforehand.