Data structure Array | ||
![]() | ||
Worst-case performance O ( n 2 ) {displaystyle O(n^{2})} Best-case performance O ( n ) {displaystyle O(n)} Average performance O ( n ) {displaystyle O(n)} Worst-case space complexity O ( n ) {displaystyle O(n)} |
ProxmapSort, or Proxmap sort, is a sorting algorithm that works by partitioning an array of data items, or keys, into a number of "subarrays" (termed buckets, in similar sorts). The name is short for computing a "proximity map," which indicates for each key K the beginning of a subarray where K will reside in the final sorted order. Keys are placed into each subarray using insertion sort. If keys are "well distributed" among the subarrays, sorting occurs in linear time. The computational complexity estimates involve the number of subarrays and the proximity mapping function, the "map key," used. It is a form of bucket and radix sort.
Contents
- Basic strategy
- Example
- Pseudocode
- Proxmap Searching
- Performance
- Optimizations
- Comparison with other sorting algorithms
- Generic bucket sort related to ProxmapSort
- References
Once a ProxmapSort is complete, ProxmapSearch can be used to find keys in the sorted array in
Both algorithms were invented in the late 1980s by Prof. Thomas A. Standish at the University of California, Irvine.
Basic strategy
In general: Given an array A with n keys:
Simplied version: Given an array A with n keys
- Initialize: Create and initialize 2 arrays of n size: hitCount, proxMap, and 2 arrays of A.length: location, and A2.
- Partition: Using a carefully chosen mapKey function, divide the A2 into subarrays using the keys in A
- Disperse: Read over A, dropping each key into its bucket in A2; insertion sorting as needed.
- Collect: Visit the subarrays in order and put all the elements back into the original array, or simply use A2.
Note: "keys" may also contain other data, for instance an array of Student objects that contain the key plus a student ID and name. This makes ProxMapSort suitable for organizing groups of objects, not just keys themselves.
Example
Consider a full array: A[0 to n-1] with n keys. Let i be an index of A. Sort A's keys into array A2 of equal size.
The map key function is defined as mapKey(key) = floor(K).
Pseudocode
Here A is the array to be sorted and the mapKey functions determines the number of subarrays to use. For example, floor(K) will simply assign as many subarrays as there are integers from the data in A. Dividing the key by a constant reduces the number of subarrays; different functions can be used to translate the range of elements in A to subarrays, such as converting the letters A–Z to 0–25 or returning the first character (0–255) for sorting strings. Subarrays are sorted as the data comes in, not after all data has been placed into the subarray, as is typical in bucket sorting.
Proxmap Searching
ProxmapSearch uses the proxMap array generated by a previously done ProxmapSort to find keys in the sorted array A2 in constant time.
Basic strategy
Pseudocode
function mapKey(key) return floor(key) proxMap ← previously generated proxmap array of size n A2 ← previously sorted array of size nfunction proxmap-search(key) for i = proxMap[mapKey(key)] to length(array)-1 if (sortedArray[i].key == key) return sortedArray[i]Performance
Computing H, P, and L all take
Having a good MapKey function is imperative for avoiding the worst case. We must know something about the distribution of the data to come up with a good key.
Optimizations
- Save time: Save the MapKey(i) values so they don't have to be recomputed (as they are in the code above)
- Save space: The proxMaps can be stored in the hitCount array,as the hit counts are not needed once the proxmap is computed; the data can be sorted back into A, instead of using A2, if one takes care to note which A values are have been sorted so far, and which not.
Comparison with other sorting algorithms
Since ProxmapSort is not a comparison sort, the Ω(n log n) lower bound is inapplicable. Its speed can be attributed to it not being comparison-based and using arrays instead of dynamically allocated objects and pointers that must be followed, such as is done with when using a binary search tree.
ProxmapSort allows for the use of ProxmapSearch. Despite the O(n) build time, ProxMapSearch makes up for it with its
Generic bucket sort related to ProxmapSort
Like ProxmapSort, bucket sort generally operates on a list of n numeric inputs between zero and some maximum key or value M and divides the value range into n buckets each of size M/n. If each bucket is sorted using insertion sort, ProxmapSort and bucket sort can be shown to run in predicted linear time. However, the performance of this sort degrades with clustering (or too few buckets with too many keys); if many values occur close together, they will all fall into a single bucket and performance will be severely diminished. This behavior also holds for ProxmapSort: if the buckets are too large, its performance will degrade severely.