In computer science and graph theory, the method of color-coding efficiently finds k-vertex simple paths, k-vertex cycles, and other small subgraphs within a given graph using probabilistic algorithms, which can then be derandomized and turned into deterministic algorithms. This method shows that many subcases of the subgraph isomorphism problem (an NP-complete problem) can in fact be solved in polynomial time.
Contents
The theory and analysis of the color-coding method was proposed in 1994 by Noga Alon, Raphael Yuster, and Uri Zwick.
Results
The following results can be obtained through the method of color-coding:
The method
To solve the problem of finding a subgraph
Suppose H becomes colorful with some non-zero probability p. It immediately follows that if the random coloring is repeated 1/p times, then H is expected to become colorful once. Note that though p is small, it is shown that if
Sometimes it is also desirable to use a more restricted version of colorfulness. For example, in the context of finding cycles in planar graphs, it is possible to develop an algorithm that finds well-colored cycles. Here, a cycle is well-colored if its vertices are colored by consecutive colors.
Example
An example would be finding a simple cycle of length k in graph G = (V, E).
By applying random coloring method, each simple cycle has a probability of
The colorful cycle-finding algorithm works by first finding all pairs of vertices in V that are connected by a simple path of length k − 1, and then checking whether the two vertices in each pair are connected. Given a coloring function c : V → {1, ..., k} to color graph G, enumerate all partitions of the color set {1, ..., k} into two subsets C1, C2 of size
Derandomization
The derandomization of color-coding involves enumerating possible colorings of a graph G, such that the randomness of coloring G is no longer required. For the target subgraph H in G to be discoverable, the enumeration has to include at least one instance where the H is colorful. To achieve this, enumerating a k-perfect family F of hash functions from {1, ..., |V|} to {1, ..., k} is sufficient. By definition, F is k-perfect if for every subset S of {1, ..., |V|} where
There are several approaches to construct such a k-perfect hash family:
- The best explicit construction is by Moni Naor, Leonard J. Schulman, and Aravind Srinivasan, where a family of size
e k k O ( log k ) log | V | can be obtained. This construction does not require the target subgraph to exist in the original subgraph finding problem. - Another explicit construction by Jeanette P. Schmidt and Alan Siegel yields a family of size
2 O ( k ) log 2 | V | . - Another construction that appears in the original paper of Noga Alon et al. can be obtained by first building a k-perfect family that maps {1, ..., |V|} to {1, ..., k2}, followed by building another k-perfect family that maps {1, ..., k2} to {1, ..., k}. In the first step, it is possible to construct such a family with 2nlog k random bits that are almost 2log k-wise independent, and the sample space needed for generating those random bits can be as small as
k O ( 1 ) log | V | . In the second step, it has been shown by Jeanette P. Schmidt and Alan Siegel that the size of such k-perfect family can be2 O ( k ) 2 O ( k ) log | V | that maps from {1, ..., |V|} to {1, ..., k} can be obtained.
In the case of derandomizing well-coloring, where each vertex on the subgraph is colored consecutively, a k-perfect family of hash functions from {1, ..., |V|} to {1, ..., k!} is needed. A sufficient k-perfect family which maps from {1, ..., |V|} to {1, ..., kk} can be constructed in a way similar to the approach 3 above (the first step). In particular, it is done by using nklog k random bits that are almost klog k independent, and the size of the resulting k-perfect family will be
The derandomization of color-coding method can be easily parallelized, yielding efficient NC algorithms.
Applications
Recently, color-coding has attracted much attention in the field of bioinformatics. One example is the detection of signaling pathways in protein-protein interaction (PPI) networks. Another example is to discover and to count the number of motifs in PPI networks. Studying both signaling pathways and motifs allows a deeper understanding of the similarities and differences of many biological functions, processes, and structures among organisms.
Due to the huge amount of gene data that can be collected, searching for pathways or motifs can be highly time consuming. However, by exploiting the color-coding method, the motifs or signaling pathways with