In statistics, jackknife variance estimates for random forest are a way to estimate the variance in random forest models, in order to eliminate the bootstrap effects.
Contents
Jackknife variance estimates
The sampling variance of bagged learners is:
Jackknife estimates can be considered to eliminate the bootstrap effects. The jackknife variance estimator is defined as:
In some classification problems, when random forest is used to fit models, jackknife estimated variance is defined as:
Here,
Examples
E-mail spam problem is a common classification problem, in this problem, 57 features are used to classify spam e-mail and non-spam e-mail. Applying IJ-U variance formula to evaluate the accuracy of models with m=15,19 and 57. The results shows in paper( Confidence Intervals for Random Forests: The jackknife and the Infinitesimal Jackknife ) that m = 57 random forest appears to be quite unstable, while predictions made by m=5 random forest appear to be quite stable, this results is corresponding to the evaluation made by error percentage, in which the accuracy of model with m=5 is high and m=57 is low.
Here, accuracy is measured by error rate, which is defined as:
Here N is also the number of samples, M is the number of classes,
Here N is the number of samples, M is the number of classes,
Modification for bias
When using Monte Carlo MSEs for estimating
To eliminate this influence, bias-corrected modifications are suggested: