Neha Patil (Editor)

One in ten rule

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit

In statistics, the one in ten rule is a rule of thumb for how many predictors can be derived from data when doing regression analysis (in particular proportional hazards models and logistic regression) without risk of overfitting. The rule states that one predictive variable can be studied for every ten events.

For example, if a sample of 200 patients are studied and 20 patients die during the study, only two pre-specified predictors can reliably be fitted to the total data. Similarly, if 120 patients die during the study (so that 80 patients survive), eight pre-specified predictors (based on the smallest of the two counts, being 80) can be fitted reliably. If more are fitted, overfitting is likely and the results will not predict well outside the training data. It is not uncommon to see the 1:10 rule violated in fields with many variables (e.g. gene expression studies in cancer), decreasing the confidence in reported findings.

The one in ten rule is a minimum; a "one in 20 rule" has been suggested, indicating the need for shrinkage of regression coefficients, and a "one in 50 rule" for stepwise selection with the default p-value of 5%.

Recent studies, however suggest that the rule may be too conservative and that five to nine events per predictor can be enough, depending on the research question.

References

One in ten rule Wikipedia