Jackknife variance estimates for random forest - Alchetron, the free social encyclopedia

In statistics, jackknife variance estimates for random forest are a way to estimate the variance in random forest models, in order to eliminate the bootstrap effects.

Jackknife variance estimates

The sampling variance of bagged learners is:

V ( x ) = V a r [ θ ^ ∞ ( x ) ]

Jackknife estimates can be considered to eliminate the bootstrap effects. The jackknife variance estimator is defined as:

V ^ j = n − 1 n ∑ i = 1 n ( θ ^ ( − i ) − θ ¯ ) 2

In some classification problems, when random forest is used to fit models, jackknife estimated variance is defined as:

V ^ j = n − 1 n ∑ i = 1 n ( t ¯ ( − i ) ⋆ ( x ) − t ¯ ⋆ ( x ) ) 2

Here, t ⋆ denotes a decision tree after training, t ( − i ) ⋆ denotes the result based on samples without i t h observation.

Examples

E-mail spam problem is a common classification problem, in this problem, 57 features are used to classify spam e-mail and non-spam e-mail. Applying IJ-U variance formula to evaluate the accuracy of models with m=15,19 and 57. The results shows in paper( Confidence Intervals for Random Forests: The jackknife and the Infinitesimal Jackknife ) that m = 57 random forest appears to be quite unstable, while predictions made by m=5 random forest appear to be quite stable, this results is corresponding to the evaluation made by error percentage, in which the accuracy of model with m=5 is high and m=57 is low.

Here, accuracy is measured by error rate, which is defined as:

E r r o r R a t e = 1 N ∑ i = 1 N ∑ j = 1 M y i j ,

Here N is also the number of samples, M is the number of classes, y i j is the indicator function which equals 1 when i t h observation is in class j, equals 0 when in other classes. No probability is considered here. There is another method which is similar to error rate to measure accuracy:

l o g l o s s = 1 N ∑ i = 1 N ∑ j = 1 M y i j l o g ( p i j )

Here N is the number of samples, M is the number of classes, y i j is the indicator function which equals 1 when i t h observation is in class j, equals 0 when in other classes. p i j is the predicted probability of i t h observation in class j .This method is used in Kaggle These two methods are very similar.

Modification for bias

When using Monte Carlo MSEs for estimating V I J ∞ and V J ∞ , a problem about the Monte Carlo bias should be considered, especially when n is large, the bias is getting large:

E [ V ^ I J B ] − V ^ I J ∞ ≈ n ∑ b = 1 B ( t b ⋆ − t ¯ ⋆ ) 2 B

To eliminate this influence, bias-corrected modifications are suggested:

V ^ I J − U B = V ^ I J B − n ∑ b = 1 B ( t b ⋆ − t ¯ ⋆ ) 2 B V ^ J − U B = V ^ J B − ( e − 1 ) n ∑ b = 1 B ( t b ⋆ − t ¯ ⋆ ) 2 B

References

Jackknife variance estimates for random forest Wikipedia

(Text) CC BY-SA

Contents