Metric k center - Alchetron, The Free Social Encyclopedia

In graph theory, the metric k-center or metric facility location problem is a combinatorial optimization problem studied in theoretical computer science. Given n cities with specified distances, one wants to build k warehouses in different cities and minimize the maximum distance of a city to a warehouse. In graph theory this means finding a set of k vertices for which the largest distance of any point to its closest vertex in the k-set is minimum. The vertices must be in a metric space, or in other words a complete graph that satisfies the triangle inequality.

Formal definition

Let ( X , d ) be a metric space where X is a set and d is a metric
A set V ⊆ X , is provided together with a parameter k . The goal is to find a subset C ⊆ V with | C | = k such that the maximum distance of a point in V to the closest point in C is minimized. The problem can be formally defined as follows:
For a metric space ( X ,d),

Input: a set V ⊆ X ,and a parameter k .

Output: a set C of k points.

Goal: Minimize the cost r C ( V ) = m a x v ∈ V d(v, C )

That is, Every point in a cluster is in distance at most r C ( V ) from its respective center.

The k-Center Clustering problem can also be defined on a complete undirected graph G = (V, E) as follows:
Given a complete undirected graph G = (V, E) with distances d(v_i, v_j) ∈ N satisfying the triangle inequality, find a subset C ⊆ V with |C| = k while minimizing:

max v ∈ V min c ∈ C d ( v , c )

Computational complexity

In a complete undirected graph G = (V, E), if we sort the edges in nondecreasing order of the distances: d(e₁) ≤ d(e₂) ≤ … ≤ d(e_m) and let G_i = (V_i, E_i), where E_i = {e₁, e₂, …, e_i}. The k-center problem is equivalent to finding the smallest index i such that G_i has a dominating set of size at most k.

Although Dominating Set is NP-complete, the k-center problem remains NP-hard. This is clear, since the optimality of a given feasible solution for the k-center problem can be determined through the Dominating Set reduction only if we know in first place the size of the optimal solution (i.e. the smallest index i such that G_i has a dominating set of size at most k) , which is precisely the difficult core of the NP-Hard problems.

A simple greedy algorithm

A simple greedy approximation algorithm that achieves an approximation factor of 2 builds C using a farthest-first traversal in k iterations. This algorithm simply chooses the point farthest away from the current set of centers in each iteration as the new center. It can be described as follows:

Pick an arbitrary point c ¯ 1 into C 1

For every point v ∈ V compute d 1 [ v ] from c ¯ 1

Pick the point c ¯ 2 with highest distance from c ¯ 1 .

Add it to the set of centers and denote this expanded set of centers as C 2 . Continue this till k centers are found

Running time

The i^th iteration of choosing the i^th center takes O ( n ) time.

There are k such iterations.

Thus, overall the algorithm takes O ( n k ) time.

Proving the approximation factor

The solution obtained using the simple greedy algorithm is a 2-approximation to the optimal solution. This section focuses on proving this approximation factor.

Given a set of n points V ⊆ X ,belonging to a metric space ( X ,d), the greedy K-center algorithm computes a set K of k centers, such that K is a 2-approximation to the optimal k-center clustering of V.

i.e. r K ( V ) ≤ 2 r o p t ( V , k )

This theorem can be proven using two cases as follows,

Case 1: Every cluster of C o p t contains exactly one point of K

Consider a point v ∈ V

Let c ¯ be the center it belongs to in C o p t

Let k ¯ be the center of K that is in Π ( C o p t , c ¯ )

d ( v , c ¯ ) = d ( v , C o p t ) ≤ r o p t ( V , k )

Similarly, d ( k ¯ , c ¯ ) = d ( k ¯ , C o p t ) ≤ r o p t

By the triangle inequality: d ( v , k ¯ ) ≤ d ( v , c ¯ ) + d ( c ¯ , k ¯ ) ≤ 2 r o p t

Case 2: There are two centers k ¯ and u ¯ of K that are both in Π ( C o p t , c ¯ ) , for some c ¯ ∈ C o p t (By pigeon hole principle, this is the only other possibility)

Assume, without loss of generality, that u ¯ was added later to the center set K by the greedy algorithm, say in i^th iteration.

But since the greedy algorithm always chooses the point furthest away from the current set of centers, we have that c ¯ ∈ C i − 1 and,

r K ( V ) ≤ r C i − 1 ( V ) = d ( u ¯ , C i − 1 ) ≤ d ( u ¯ , k ¯ ) ≤ d ( u ¯ , c ¯ ) + d ( c ¯ , k ¯ ) ≤ 2 r o p t

Another 2-factor approximation algorithm

Another algorithm with the same approximation factor takes advantage of the fact that the k-center problem is equivalent to finding the smallest index i such that G_i has a dominating set of size at most k and computes a maximal independent set of G_i, looking for the smallest index i that has a maximal independent set with a size of at least k. It is not possible to find an approximation algorithm with an approximation factor of 2 − ε for any ε > 0, unless P = NP. Furthermore, the distances of all edges in G must satisfy the triangle inequality if the k-center problem is to be approximated within any constant factor, unless P = NP.

References

Metric k-center Wikipedia

(Text) CC BY-SA

Contents