Kalpana Kalpana (Editor)

Double Cut and Join Model

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit
Double Cut and Join Model

The double cut and join (DCJ) model is a model for genome rearrangement used to define an edit distance between genomes based on gene order and orientation, rather than nucleotide sequence. It takes the fundamental units of a genome to be synteny blocks, maximal sections of DNA conserved between genomes. Rather than seeking to explain differences between genomes as the result of point mutations, it focuses on changes caused by inversions and translocation.

A genome is described as a directed, edge labeled graph with each vertex having degree 1 or 2. Edges are labeled as synteny blocks, vertices of degree 1 represent telomeres, and vertices of degree 2 representing adjacencies between blocks. This requires that the genome consist of cycles and paths. Each component is called a chromosome. The beginning of each edge is called the tail, the end of each edge is called the head; together heads and tails are known as extremities. Vertices are described by their roles as heads and tails of blocks, for instance, in the figure, the adjacency which forms the head of marker 1 and the tail of marker 2 is labelled (h1, t2), the telomere formed by the head of 2 is (h2). A double cut and join (DCJ) operation consists of one of the following four transformations:

  • (i) breaking two adjacencies (a, b) and (c, d) to create two more adjacencies, (a, c) and (b,d)
  • (ii) taking an adjacency (a, b) and a telomere (c) to create a new adjacency and telomere, either as (a, c), (b) or (b,c), (a).
  • (iii) taking two telomeres (a) and (b) and creating a new adjacency (a, b)
  • (iv) breaking an adjacency (a, b) to create the two telomeres (a) and (b).
  • An edit distance, the double cut and join distance, is defined between genomes with the same number of edges G 1 and G 2 , d D C J ( A , B ) as the minimum number of DCJ operations needed to transform A into B .

    Mathematical Results

    The DCJ distance defines a metric space. To verify this, we first note that d D C J ( G , G ) = 0 , since no operations are needed to change G into itself, and if G 1 G 2 , d D C J ( G 1 , G 2 ) > 0 , since at least one operation is needed to transform G 1 into any other genome. (A proof that the d D C J ( G 1 , G 2 ) is always defined whenever G 1 and G 2 are genomes with the same edges will follow.) Note that each operation has an inverse: (i) and (ii) are their own inverses, and (iii) is the inverse of (iv). Thus d D C J ( G 1 , G 2 ) = d D C J ( G 1 , G 2 ) . The triangle inequality d D C J ( G 1 , G 3 ) d D C J ( G 1 , G 2 ) + d D C J ( G 2 , G 3 ) holds because a series of DCJ operations transforming G 1 to G 2 followed by a series of transformations from G 2 to G 3 will transform G 1 to G 3 , so the minimal number of operations needed to transform G 1 to G 3 must be no longer than this.

    To compute the DCJ distance between two genomes G 1 and G 2 with the same set of synteny blocks, we construct a bipartite multigraph known as the adjacency graph A = ( V ( G 1 ) , V ( G 2 ) , E ) of the genomes. The vertex set of the adjacency graph is ( V ( G 1 ) , V ( G 2 ) ) , where V ( G 1 ) is the vertex set of G 1 and V ( G 2 ) is the vertex set of G 2 . For a V ( G 1 ) and b V ( G 2 ) , we have ( a , b ) E if a and b are an extremity of the same synteny block. If a and b share two extremities, we add two edges between a and b to G .

    If A = B , we see that the adjacency graph is composed entirely of paths of length 1, connecting two telomeres, and cycles of length 2, connecting two adjacencies. We can use this fact to calculate d D C J . Let N be the number of synteny blocks in genomes G 1 and G 2 , C be the number of cycles in their adjacency graph, and I be the number of paths in their adjacency graph. Then d D C J ( G 1 , G 2 ) = N ( C + 1 / 2 ) . The proof shows that each DCJ operation can decrease N ( C + 1 / 2 ) by no more than 1, and that if G 1 G 2 , there exists an operation decreasing N ( C + I / 2 ) . . This proves that d D C J is always defined, and gives a method for its calculation. Since it is easy to count cycles and paths, d D C J can be found in linear time.

    References

    Double Cut and Join Model Wikipedia


    Similar Topics