The dead-end elimination algorithm (DEE) is a method for minimizing a function over a discrete set of independent variables. The basic idea is to identify "dead ends", i.e., combinations of variables that are not necessary to define a global minimum because there is always a way of replacing such combination by a better or equivalent one. Then we can refrain from searching such combinations further. Hence, dead-end elimination is a mirror image of dynamic programming, in which "good" combinations are identified and explored further. Although the method itself is general, it has been developed and applied mainly to the problems of predicting and designing the structures of proteins. It closely related to the notion of dominance in optimization also known as substitutability in a Constraint Satisfaction Problem. The original description and proof of the dead-end elimination theorem can be found in [1].
Contents
Basic requirements
An effective DEE implementation requires four pieces of information:
- A well-defined finite set of discrete independent variables
- A precomputed numerical value (considered the "energy") associated with each element in the set of variables (and possibly with their pairs, triples, etc.)
- A criterion or criteria for determining when an element is a "dead end", that is, when it cannot possibly be a member of the solution set
- An objective function (considered the "energy function") to be minimized
Note that the criteria can easily be reversed to identify the maximum of a given function as well.
Applications to protein structure prediction
Dead-end elimination has been used effectively to predict the structure of side chains on a given protein backbone structure by minimizing an energy function
In the following discussion, let
Where
Also note that
Singles elimination criterion
If a particular rotamer
where
Pairs elimination criterion
The pairs criterion is more difficult to describe and to implement, but it adds significant eliminating power. For brevity, we define the shorthand variable
A given pair of rotamers
where
Energy matrices
For large
Implementation and efficiency
The above two criteria are normally applied iteratively until convergence, defined as the point at which no more rotamers or pairs can be eliminated. Since this is normally a reduction in the sample space by many orders of magnitude, simple enumeration will suffice to determine the minimum within this pared-down set.
Given this model, it is clear that the DEE algorithm is guaranteed to find the optimal solution; that is, it is a global optimization process. The single-rotamer search scales quadratically in time with total number of rotamers. The pair search scales cubically and is the slowest part of the algorithm (aside from energy calculations). This is a dramatic improvement over the brute-force enumeration which scales as
A large-scale benchmark of DEE compared with alternative methods of protein structure prediction and design finds that DEE reliably converges to the optimal solution for protein lengths for which it runs in a reasonable amount of time[2]. It significantly outperforms the alternatives under consideration, which involved techniques derived from mean field theory, genetic algorithms, and the Monte Carlo method. However, the other algorithms are appreciably faster than DEE and thus can be applied to larger and more complex problems; their relative accuracy can be extrapolated from a comparison to the DEE solution within the scope of problems accessible to DEE.
Protein design
The preceding discussion implicitly assumed that the rotamers
Generalizations
More powerful and more general criteria have been introduced that improve both the efficiency and the eliminating power of the method for both prediction and design applications. One example is a refinement of the singles elimination criterion known as the Goldstein criterion[4], which arises from fairly straightforward algebraic manipulation before applying the minimization:
Thus rotamer
An extended discussion of elaborate DEE criteria and a benchmark of their relative performance can be found in [5].