Distance oracle

Updated on Dec 20, 2024

Edit

Comment

In computing, a distance oracle (DO) is a data structure for calculating distances between vertices in a graph.

Introduction

Let G(V,E) be an undirected, weighted graph, with n = |V| nodes and m = |E| edges. We would like to answer queries of the form "what is the distance between the nodes s and t?".

One way to do this is just run the Dijkstra algorithm. This takes time O ( m + n log ⁡ n ) , and requires no extra space (besides the graph itself).

In order to answer many queries more efficiently, we can spend some time in pre-processing the graph and creating an auxiliary data structure.

A simple data structure that achieves this goal is a matrix which specifies, for each pair of nodes, the distance between them. This structure allows us to answer queries in constant time O ( 1 ) , but requires O ( n 2 ) extra space. It can be initialized in time O ( n 3 ) using an all-pairs shortest paths algorithm, such as the Floyd–Warshall algorithm.

A DO lies between these two extremes. It uses less than O ( n 2 ) space in order to answer queries in less than O ( m + n log ⁡ n ) time. Most DOs have to compromise on accuracy, i.e. they don't return the accurate distance but rather a constant-factor approximation of it.

Approximate DO

describe more than 10 different DOs. They then suggest a new DO that, for every k, requires space O ( k n 1 + 1 / k ) , such that any subsequent distance query can be approximately answered in time O ( k ) . The approximate distance returned is of stretch at most 2 k − 1 , that is, the quotient obtained by dividing the estimated distance by the actual distance lies between 1 and 2 k − 1 . The initialization time is O ( k m n 1 / k ) .

Some special cases include:

For k = 1 we get the simple distance matrix.

For k = 2 we get a structure using O ( n 1.5 ) space which answers each query in constant time and approximation factor at most 3.

For k = ⌊ log ⁡ n ⌋ , we get a structure using O ( n log ⁡ n ) space, query time O ( log ⁡ n ) , and stretch O ( log ⁡ n ) .

Higher values of k do not improve the space or preprocessing time.

DO for general metric spaces

The oracle is built of a decreasing collection of k+1 sets of vertices:

A 0 = V

For every i = 1 , … , k − 1 : A i contains each element of A i − 1 , independently, with probability n − 1 / k . Note that the expected size of A i is n 1 − i / k . The elements of A i are called i-centers.

A k = ∅

For every node v, calculate its distance from each of these sets:

For every i = 0 , … , k − 1 : d ( A i , v ) = min ( d ( w , v ) | w ∈ A i and p i ( v ) = arg ⁡ min ( d ( w , v ) | w ∈ A i . I.e., p i ( v ) is the i-center nearest to v, and d ( A i , v ) is the distance between them. Note that for a fixed v, this distance is weakly increasing with i. Also note that for every v, d ( A 0 , v ) = 0 a n d p 0 ( v ) = v .

d ( A k , v ) = ∞ .

For every node v, calculate:

For every i = 0 , … , k − 1 : B i ( v ) = { w ∈ A i ∖ A i + 1 | d ( w , v ) < d ( A i + 1 , v ) } . B i ( v ) contains all vertices in A i which are strictly closer to v than all vertices in A i + 1 . The partial unions of B i s are balls in increasing diameter, that contain vertices with distances up to the first vertex of the next level.

For every v, compute its bunch:

B ( v ) = ⋃ i = 0 k − 1 B i ( v ) .

It is possible to show that the expected size of B ( v ) is at most k n 1 / k .

For every bunch B ( v ) , construct a hash table that holds, for every w ∈ B ( V ) , the distance d ( w , v ) .

The total size of the data structure is O ( k n + Σ | B ( v ) | ) = O ( k n + n k n 1 / k ) = O ( k n 1 + 1 / k )

Having this structure initialized, the following algorithm finds the distance between two nodes, u and v:

w := u , i := 0

while w ∉ B ( v ) :

i := i + 1

( u , v ) := ( v , u ) (swap the two input nodes; this does not change the distance between them since the graph is undirected).

w := p i ( u )

return d ( w , u ) + d ( w , v )

It is possible to show that, in each iteration, the distance d ( w , u ) grows by at most d ( v , u ) . Since A k − 1 ⊆ B ( v ) , there are at most k-1 iterations, so finally d ( w , u ) ≤ ( k − 1 ) d ( v , u ) . Now by the triangle inequality, d ( w , v ) ≤ d ( w , u ) + d ( u , v ) ≤ k d ( v , u ) , so the distance returned is at most ( 2 k − 1 ) d ( u , v ) .

Improvements

The above result was improved by who suggest a DO of size O ( n 4 / 3 m 1 / 3 ) which returns a factor 2 approximation.

Reduction from set intersection oracle

If there is a DO with an approximation factor of at most 2, then it is possible to build a set intersection oracle (SIO) with query time O ( 1 ) and space requirements O ( N + n ) , where n is the number of sets and N the sum of their sizes; see set intersection oracle#Reduction to approximate distance oracle.

It is believed that the SIO problem does not have a non-trivial solution. I.e., it requires Ω ( n 2 ) space to answer queries in time O ( 1 ) , e.g. using an n-by-n table with the intersection between each two sets. If this conjecture is true, this implies that there is no DO with an approximation factor of less than 2 and a constant query time.

References

Distance oracle Wikipedia

(Text) CC BY-SA