ProbCons is an open source probabilistic consistency-based multiple alignment of amino acid sequences. It is an efficient protein multiple sequence alignment program, which has demonstrated a statistically significant improvement in accuracy compared to several leading alignment tools.
Contents
Algorithm
The following describes the basic outline of the ProbCons algorithm.
Step 1: Reliability of an alignment edge
For every pair of sequences compute the probability that letters
(Where
Step 2: Maximum expected accuracy
The accuracy of an alignment
Calculate expected accuracy of each sequence:
This yields a maximum expected accuracy (MEA) alignment:
Step 3: Probabilistic Consistency Transformation
All pairs of sequences x,y from the set of all sequences
This step can be iterated.
Step 4: Computation of guide tree
Construct a guide tree by hierarchical clustering using MEA score as sequence similarity score. Cluster similarity is defined using weighted average over pairwise sequence similarity.
Step 5: Compute MSA
Finally compute the MSA using progressive alignment or iterative alignment.