Rahul Sharma (Editor)

ProbCons

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit

ProbCons is an open source probabilistic consistency-based multiple alignment of amino acid sequences. It is an efficient protein multiple sequence alignment program, which has demonstrated a statistically significant improvement in accuracy compared to several leading alignment tools.

Contents

Algorithm

The following describes the basic outline of the ProbCons algorithm.

Step 1: Reliability of an alignment edge

For every pair of sequences compute the probability that letters x i and y i are paired in a an alignment that is generated by the model.

P ( x i y i | x , y ) = d e f P r [ x i y i  in some a  | x , y ] = alignment a with  x i y i P r [ a | x , y ] = alignment a 1 { x i y i a } P r [ a | x , y ]

(Where 1 { x i y i a } is equal to 1 if x i and y i are in the alignment and 0 otherwise.)

Step 2: Maximum expected accuracy

The accuracy of an alignment a with respect to another alignment a is defined as the number of common aligned pairs divided by the length of the shorter sequence.

Calculate expected accuracy of each sequence:

E P r [ a | x , y ] ( a c c ( a , a ) ) = a P r [ a | x , y ] a c c ( a , a ) = 1 m i n ( | x | , | y | ) a 1 { x i y i a } P r [ a | x , y ] = 1 m i n ( | x | , | y | ) x i y i P ( x i y j | x , y )

This yields a maximum expected accuracy (MEA) alignment:

E ( x , y ) = arg max a E P r [ a | x , y ] ( a c c ( a , a ) )

Step 3: Probabilistic Consistency Transformation

All pairs of sequences x,y from the set of all sequences S are now re-estimated using all intermediate sequences z:

P ( x i y i | x , y ) = 1 | S | z 1 k | z | P ( x i z i | x , z ) P ( z i y i | z , y )

This step can be iterated.

Step 4: Computation of guide tree

Construct a guide tree by hierarchical clustering using MEA score as sequence similarity score. Cluster similarity is defined using weighted average over pairwise sequence similarity.

Step 5: Compute MSA

Finally compute the MSA using progressive alignment or iterative alignment.

References

ProbCons Wikipedia