In the design and analysis of experiments, post hoc analysis (from Latin post hoc, "after this") consists of looking at the data—after the experiment has concluded—for patterns that were not specified a priori. It is sometimes called data dredging by critics to evoke the sense that the more one looks the more likely something will be found. More subtly, each time a pattern in the data is considered, a statistical test is effectively performed. This greatly inflates the total number of statistical tests and necessitates the use of multiple testing procedures to compensate. However, this is difficult to do precisely and in fact most results of post hoc analyses are reported as they are with unadjusted p-values. These p-values must be interpreted in light of the fact that they are a small and selected subset of a potentially large group of p-values. Results of post hoc analyses should be explicitly labeled as such in reports and publications to avoid misleading readers.
Contents
- Relationship with the multiple comparisons problem
- Tests
- Fishers least significant difference LSD
- The Bonferroni procedure
- HolmBonferroni method
- NewmanKeuls method
- Duncans new multiple range test MRT
- Rodgers method
- Scheffs method
- Tukeys procedure
- Dunnetts correction
- BenjaminiHochberg BH procedure
- References
In practice, post hoc analyses are usually concerned with finding patterns and/or relationships between subgroups of sampled populations that would otherwise remain undetected and undiscovered were a scientific community to rely strictly upon a priori statistical methods. Post hoc tests—also known as a posteriori tests—greatly expand the range and capability of methods that can be applied in exploratory research. Post hoc examination strengthens induction by limiting the probability that significant effects will seem to have been discovered between subgroups of a population when none actually exist. As it is, many scientific papers are published without adequate, preventative post hoc control of the type I error rate.
Post hoc analysis is an important procedure without which multivariate hypothesis testing would greatly suffer, rendering the chances of discovering false positives unacceptably high. Ultimately, post hoc testing creates better informed scientists who can therefore formulate better, more efficient a priori hypotheses and research designs.
Relationship with the multiple comparisons problem
In its most literal and narrow sense, post hoc analysis simply refers to unplanned data analysis performed after the data is collected in order to reach further conclusions. In this sense, even a test that does not provide Type I Error Rate protection, using multiple comparisons methods, is considered as post hoc analysis. A good example is performing initially unplanned multiple t-tests at level
In the wider and more useful sense, post hoc analysis tests enable protection from the multiple comparisons problem, whether the inferences made are selective or simultaneous. The type of inference is related directly to the hypotheses family of interest. Simultaneous inference indicates that all inferences, in the family of all hypotheses, are jointly corrected up to a specified type I error rate. In practice, post hoc analyses are usually concerned with finding patterns and/or relationships between subgroups of sampled populations that would otherwise remain undetected and undiscovered were a scientific community to rely strictly upon a priori statistical methods. Therefore, simultaneous inference may be too conservative for certain large scale problems that are currently being addressed by science. For such problems, a selective inference approach might be more suitable, since it assumes that sub-groups of hypotheses from the large scale group can be viewed as a family. Selective post hoc examination strengthens induction by limiting the probability that significant differences will seem to have been discovered between sub-groups of a population when none actually exist. Accordingly, p-values of such sub-groups must be interpreted in light of the fact that they are a small and selected subset of a potentially large group of p-values.
Tests
The following are referred to as "post hoc tests". However, on some occasions a researcher may have initially planned on using them, thus referring to them as "post-hoc tests" is not entirely accurate. For instance, The Newman–Keuls and Tukey's methods are often referred to as post hoc. However, it is not uncommon to plan on testing all pairwise comparisons before seeing the data. Therefore, in such cases, these tests are better categorized as a priori.
Fisher's least significant difference (LSD)
This technique was developed by Ronald Fisher in 1935 and is used most commonly after a null hypothesis in an analysis of variance (ANOVA) test is rejected (assuming normality and homogeneity of variances). A significant ANOVA test only reveals that not all the means compared in the test are equal. Fisher's LSD is basically a set of individual t-tests, differentiated only in the calculation of the standard deviation. In each t-test, a pooled standard deviation is computed from only the two groups being compared, while the Fisher's LSD test computes the pooled standard deviation from all groups - thus increasing power. Fisher's LSD does not correct for multiple comparisons.
The Bonferroni procedure
This method is flexible (requires no assumptions) and very simple to compute, but can result in a large reduction in the statistical power of the test. That is, because the cut-off value is reduced, it becomes substantially more difficult for any result to be found statistically significant.
Holm–Bonferroni method
Newman–Keuls method
A stepwise multiple comparisons procedure used to identify sample means that are significantly different from each other. It is used often as a post hoc test whenever a significant difference between three or more sample means has been revealed by an analysis of variance (ANOVA)
Duncan's new multiple range test (MRT)
Duncan developed this test as a modification of the Newman–Keuls method that would have greater power. Duncan's MRT is especially protective against false negative (Type II) error at the expense of having a greater risk of making false positive (Type I) errors.
Rodger's method
Rodger's method is a procedure for examining research data post hoc following an 'omnibus' analysis, that is after carrying out an analysis of variance (ANOVA). Rodger's method utilizes a decision-based error rate, arguing that it is not the probability (
Scheffé's method
Scheffé's method applies to the set of estimates of all possible contrasts among the factor level means, not just the pairwise differences. Having an advantage of flexibility, it can be used to test any number of post hoc simple and/or complex comparisons that appear interesting. However, the drawback of this flexibility is a low type I error rate, and a low power.
Tukey's procedure
A correction with a similar framework is Fisher’s LSD (least significant difference).
Dunnett's correction
Charles Dunnett (1955, 1966) described an alternative alpha error adjustment when k groups are compared to the same control group. Now known as Dunnett's test, this method is less conservative than the Bonferroni adjustment.
Benjamini–Hochberg (BH) procedure
BH-procedure is a step-up procedure iterating over