![]() | ||
An alternating decision tree (ADTree) is a machine learning method for classification. It generalizes decision trees and has connections to boosting.
Contents
- History
- Motivation
- Alternating decision tree structure
- Example
- Description of the algorithm
- Empirical results
- References
An ADTree consists of an alternation of decision nodes, which specify a predicate condition, and prediction nodes, which contain a single number. An instance is classified by an ADTree by following all paths for which all decision nodes are true, and summing any prediction nodes that are traversed.
History
ADTrees were introduced by Yoav Freund and Llew Mason. However, the algorithm as presented had several typographical errors. Clarifications and optimizations were later presented by Bernhard Pfahringer, Geoffrey Holmes and Richard Kirkby. Implementations are available in Weka and JBoost.
Motivation
Original boosting algorithms typically used either decision stumps or decision trees as weak hypotheses. As an example, boosting decision stumps creates a set of
Boosting a simple learner results in an unstructured set of
Another important feature of boosted algorithms is that the data is given a different distribution at each iteration. Instances that are misclassified are given a larger weight while accurately classified instances are given reduced weight.
Alternating decision tree structure
An alternating decision tree consists of decision nodes and prediction nodes. Decision nodes specify a predicate condition. Prediction nodes contain a single number. ADTrees always have prediction nodes as both root and leaves. An instance is classified by an ADTree by following all paths for which all decision nodes are true and summing any prediction nodes that are traversed. This is different from binary classification trees such as CART (Classification and regression tree) or C4.5 in which an instance follows only one path through the tree.
Example
The following tree was constructed using JBoost on the spambase dataset (available from the UCI Machine Learning Repository). In this example, spam is coded as
The following table contains part of the information for a single instance.
The instance is scored by summing all of the prediction nodes through which it passes. In the case of the instance above, the score is calculated as
The final score of
Care must be taken when interpreting individual nodes as the scores reflect a re weighting of the data in each iteration.
Description of the algorithm
The inputs to the alternating decision tree algorithm are:
The fundamental element of the ADTree algorithm is the rule. A single rule consists of a precondition, a condition, and two scores. A condition is a predicate of the form "attribute
Several auxiliary functions are also required by the algorithm:
The algorithm is as follows:
1 function ad_tree2 input Set of m training instances3 4 wi = 1/m for all i5The set
Empirical results
Figure 6 in the original paper demonstrates that ADTrees are typically as robust as boosted decision trees and boosted decision stumps. Typically, equivalent accuracy can be achieved with a much simpler tree structure than recursive partitioning algorithms.