McDonald–Kreitman test - Alchetron, The Free Social Encyclopedia

The McDonald–Kreitman test is a statistical test often used by evolution and population biologists to detect and measure the amount of adaptive evolution within a species by determining whether adaptive evolution has occurred, and the proportion of substitutions that resulted from positive selection (also known as directional selection). To do this, the McDonald–Kreitman test compares the amount of variation within a species (polymorphism) to the divergence between species (substitutions) at two types of sites, neutral and nonneutral. A substitution refers to a nucleotide that is fixed within one species, but a different nucleotide is fixed within a second species at the same base pair of homologous DNA sequences. A site is nonneutral if it is either advantageous or deleterious. The two types of sites can be either synonymous or nonsynonymous within a protein-coding region. In a protein-coding sequence of DNA, a site is synonymous if a point mutation at that site would not change the amino acid, also known as a silent mutation. Because the mutation did not result in a change in the amino acid that was originally coded for by the protein-coding sequence, the phenotype, or the observable trait, of the organism is generally unchanged by the silent mutation. A site in a protein-coding sequence of DNA is nonsynonymous if a point mutation at that site results in a change in the amino acid, resulting in a change in the organism's phenotype. Typically, silent mutations in protein-coding regions are used as the "control" in the McDonald–Kreitman test.

In 1991, John H. McDonald and Martin Kreitman derived the McDonald–Kreitman test while performing an experiment with Drosophila (fruit flies) and their differences in amino acid sequence of the alcohol dehydrogenase gene. McDonald and Kreitman proposed this method to estimate the proportion of substitutions that are fixed by positive selection rather than by genetic drift.

In order to set up the McDonald–Kreitman test, we must first set up a two-way contingency table of our data on the species being investigated as shown below:

D_s: the number of synonymous substitutions per gene

D_n: the number of non-synonymous substitutions per gene

P_s: the number of synonymous polymorphisms per gene

P_n: the number of non-synonymous polymorphisms per gene

To quantify the values for D_s, D_n, P_s, and P_n, you count the number of differences in the protein-coding region for each type of variable in the contingency table.

The null hypothesis of the McDonald–Kreitman test is that the ratio of nonsynonymous to synonymous variation within a species is going to equal the ratio of nonsynonymous to synonymous variation between species (i.e. D_n/D_s = P_n/P_s). When positive or negative selection (natural selection) influences nonsynonymous variation, the ratios will no longer equal. The ratio of nonsynonymous to synonymous variation between species is going to be lower than the ratio of nonsynonymous to synonymous variation within species (i.e. D_n/D_s < P_n/P_s) when negative selection is at work, and deleterious mutations strongly affect polymorphism. The ratio of nonsynonymous to synonymous variation within species is lower than the ratio of nonsynonymous to synonymous variation between species (i.e. D_n/D_s > P_n/P_s) when we observe positive selection. Since mutations under positive selection spread through a population rapidly, they don't contribute to polymorphism but do have an effect on divergence.

Using an equation derived by Smith and Eyre-Walker, we can estimate the proportion of base substitutions fixed by natural selection, α, using the following formula:

α = 1 − D s P n D n P s

Alpha represents the proportion of substitutions driven by positive selection. Alpha can be equal to any number between -∞ and 1. Negative values of alpha are produced by sampling error or violations of the model, such as the segregation of slightly deleterious amino acid mutations. Similar to above, our null hypothesis here is that α=0, and we expect D_n/D_s to equal P_n/P_s.

The Neutrality Index

The neutrality index (NI) quantifies the direction and degree of departure from neutrality (where P_n/P_s and D_n/D_s ratios equal). When assuming that silent mutations are neutral, a neutrality index greater than 1 (i.e. NI > 1) indicates negative selection is at work, resulting in an excess of amino acid polymorphism. This occurs because natural selection is favoring the purifying selection, and the weeding out of deleterious alleles. Because silent mutations are neutral, a neutrality index lower than 1 (i.e. NI < 1) indicates an excess of nonsilent divergence, which occurs when positive selection is at work in the population. When positive selection is acting on the species, natural selection favors a specific phenotype over other phenotypes, and the favored phenotype begins to go to fixation in the species as the allele frequency for that phenotype increases. To find the neutrality index, we can use the following equation:

N I = P n / P s D n / D s

Error-correcting Mechanisms of the McDonald–Kreitman test

There continues to be more experimentation with the McDonald–Kreitman test and how to improve the accuracy of the test. The most important error to correct for is the error that α is severely underestimated in the presence of slightly deleterious mutations, as discussed in the previous section "Sources of Error with the McDonald-Kreitman Test." This possible adjustment of the McDonald–Kreitman test includes removing polymorphisms below a specific value from the data set to improve and increase the number of substitutions that occurred due to adaptive evolution. To minimize the impact of slightly deleterious mutations, it has been proposed to exclude polymorphisms that are below a certain cutoff frequency, such as <8% or <5% (there is still much debate about what the best cutoff value should be). By not including polymorphisms under a certain frequency, you can reduce the bias created by slightly deleterious mutations, since less polymorphisms will be counted. This will drive the estimate of α up. Therefore, the degree of adaptive evolution estimated will not be so severely underestimated, deeming the McDonald–Kreitman test to be more reliable.

One adjustment necessary is to control for the type I error in the McDonald–Kreitman test, refer to the discussion of this in previous section "Sources of Error with the McDonald Kreitman Test." One method to avoid type I errors is to avoid using populations that have undergone a recent bottleneck, meaning they have recently undergone a recent decrease in effective population size. To make the analysis as accurate as possible in the McDonald–Kreitman test, it is best to use large sample sizes, but there is still debate and how large "large" is. Another method of controlling for type I error, Peter Andolfatto(2008) suggests, is to establish significance levels by coalescent simulation with recombination in genomewide scans for selection on noncoding DNA. By doing this, you will be able to improve the accuracy of your statistical test and avoid any false positive tests. < With all these possible ways to avoid making type I errors, scientists should cautiously choose which populations they are analyzing, to avoid analyzing populations that will lead to inaccurate results.

References

McDonald–Kreitman test Wikipedia

(Text) CC BY-SA

Contents

The Neutrality Index

Error-correcting Mechanisms of the McDonald–Kreitman test

References