Segment tree - Alchetron, The Free Social Encyclopedia

In computer science, a segment tree is a tree data structure for storing intervals, or segments. It allows querying which of the stored segments contain a given point. It is, in principle, a static structure; that is, it's a structure that cannot be modified once it's built. A similar data structure is the interval tree.

Structure description

This section describes the structure of a segment tree in a one-dimensional space.

Let S be a set of intervals, or segments. Let p₁, p₂, ..., p_m be the list of distinct interval endpoints, sorted from left to right. Consider the partitioning of the real line induced by those points. The regions of this partitioning are called elementary intervals. Thus, the elementary intervals are, from left to right:

( − ∞ , p 1 ) , [ p 1 , p 1 ] , ( p 1 , p 2 ) , [ p 2 , p 2 ] , . . . , ( p m − 1 , p m ) , [ p m , p m ] , ( p m , + ∞ )

That is, the list of elementary intervals consists of open intervals between two consecutive endpoints p_i and p_i+1, alternated with closed intervals consisting of a single endpoint. Single points are treated themselves as intervals because the answer to a query is not necessarily the same at the interior of an elementary interval and its endpoints.

Given a set I of intervals, or segments, a segment tree T for I is structured as follows:

T is a binary tree.

Its leaves correspond to the elementary intervals induced by the endpoints in I, in an ordered way: the leftmost leaf corresponds to the leftmost interval, and so on. The elementary interval corresponding to a leaf v is denoted Int(v).

The internal nodes of T correspond to intervals that are the union of elementary intervals: the interval Int(N) corresponding to node N is the union of the intervals corresponding to the leaves of the tree rooted at N. That implies that Int(N) is the union of the intervals of its two children.

Each node or leaf v in T stores the interval Int(v) and a set of intervals, in some data structure. This canonical subset of node v contains the intervals [x, x′] from I such that [x, x′] contains Int(v) and does not contain Int(parent(v)). That is, each node in T stores the segments that span through its interval, but do not span through the interval of its parent.

Storage requirements

This section analyzes the storage cost of a segment tree in a one-dimensional space.

A segment tree T on a set I of n intervals uses O(nlogn) storage.

Proof:Lemma: Any interval [x, x′] of I is stored in the canonical set for at most two nodes at the same depth.The set I has at most 4n + 1 elementary intervals. Because T is a binary balanced tree with at most 4n + 1 leaves, its height is O(logn). Since any interval is stored at most twice at a given depth of the tree, that the total amount of storage is O(nlogn).

Construction

This section describes the construction of a segment tree in a one-dimensional space.

A segment tree from the set of segments I, can be built as follows. First, the endpoints of the intervals in I are sorted. The elementary intervals are obtained from that. Then, a balanced binary tree is built on the elementary intervals, and for each node v it is determined the interval Int(v) it represents. It remains to compute the canonical subsets for the nodes. To achieve this, the intervals in I are inserted one by one into the segment tree. An interval X = [x, x′] can be inserted in a subtree rooted at T, using the following procedure:

If Int(T) is contained in X then store X at T, and finish.

Else:

If X intersects the interval of the left child of T, then insert X in that child, recursively.

If X intersects the interval of the right child of T, then insert X in that child, recursively.

The complete construction operation takes O(nlogn) time, n being the number of segments in I.

ProofSorting the endpoints takes O(nlogn). Building a balanced binary tree from the sorted endpoints, takes linear time on n.The insertion of an interval X = [x, x′] into the tree, costs O(logn).Proof: Visiting every node takes constant time (assuming that canonical subsets are stored in a simple data structure like a linked list). When we visit node v, we either store X at v, or Int(v) contains an endpoint of X. As proved above, an interval is stored at most twice at each level of the tree. There is also at most one node at every level whose corresponding interval contains x, and one node whose interval contains x′. So, at most four nodes per level are visited. Since there are O(logn) levels, the total cost of the insertion is O(logn).

Query

This section describes the query operation of a segment tree in a one-dimensional space.

A query for a segment tree, receives a point q_x(should be one of the leaves of tree), and retrieves a list of all the segments stored which contain the point q_x.

Formally stated; given a node (subtree) v and a query point q_x, the query can be done using the following algorithm:

Report all the intervals in I(v).

If v is not a leaf:

If q_x is in Int(left child of v) then

Perform a query in the left child of v.

If q_x is in Int(right child of v) then

Perform a query in the right child of v.

In a segment tree that contains n intervals, those containing a given query point can be reported in O(logn + k) time, where k is the number of reported intervals.

Proof: The query algorithm visits one node per level of the tree, so O(logn) nodes in total. In the other hand, at a node v, the segments in I are reported in O(1 + k_v) time, where k_v is the number of intervals at node v, reported. The sum of all the k_v for all nodes v visited, is k, the number of reported segments.

Generalization for higher dimensions

The segment tree can be generalized to higher dimension spaces, in the form of multi-level segment trees. In higher dimensional versions, the segment tree stores a collection of axis-parallel (hyper-)rectangles, and can retrieve the rectangles that contain a given query point. The structure uses O(nlog^dn) storage, and answers queries in O(log^dn).

The use of fractional cascading lowers the query time bound by a logarithmic factor. The use of the interval tree on the deepest level of associated structures lowers the storage bound by a logarithmic factor.

History

The segment tree was discovered by Jon Louis Bentley in 1977; in "Solutions to Klee’s rectangle problems".

References

Segment tree Wikipedia

(Text) CC BY-SA

Contents