A set intersection oracle (SIO) is a data structure which represents a collection of sets and can quickly answer queries about whether the set intersection of two given sets is non-empty.
Contents
- Minimum memory maximum query time
- Maximum memory minimum query time
- A compromise
- Reduction to approximate distance oracle
- References
The input to the problem is n finite sets. The sum of the sizes of all sets is N (which also means that there are at most N distinct elements). The SIO should quickly answer any query of the form:
"Does the set Si intersect the set Sk"?Minimum memory, maximum query time
Without any pre-processing, a query can be answered by inserting the elements of Si into a temporary hash table and then checking for each element of Sk whether it is in the hash table. The query time is
Maximum memory, minimum query time
Alternatively, we can pre-process the sets and create an n-by-n table where the intersection information is already entered. Then the query time is
A compromise
Define a "large set" as a set with at least
Given two sets, there are three possible cases:
- Both sets are large. Then just read the answer to the intersection query from the table, in time
O ( 1 ) . - Both sets are small. Then insert the elements of one of them into a hash table and check the elements of the other one; because the sets are small, the required time is
O ( N ) . - One set is large and one set is small. Loop over all elements in the small set and check them against the hash table of the large set. The required time is again
O ( N ) .
In general, if we define a "large set" as a set with at least
Reduction to approximate distance oracle
The SIO problem can be reduced to the approximate distance oracle (DO) problem, in the following way.
This graph has the following properties:
So, with a DO whose approximation factor of less than 2, we can solve the SIO problem.
It is believed that the SIO problem does not have a non-trivial solution. I.e., it requires