D-statistics for Dummies: A simple test for introgression

A three-sample test to detect introgression.

Introgressive hybridization seems to be a common phenomenon. The advent of genomic data has revealed the exchange of genetic material between numerous species (see for example Mallet et al. (2016) and Taylor & Larson (2019) for recent reviews). In concert with the explosive expansion in genomic resources, scientists have developed several statistical tests to detect introgression. I have provided an overview of these methods in my Avian Research paper: “Avian Introgression in the Genomic Era”. However, new methods keep popping up and a recent addition to the introgression-toolbox is particularly interesting: in the journal Molecular Biology and Evolution, Matthew Hahn and Mark Hibbins introduce a three-sample test for introgression.

 

ABBA-BABA

To understand the rationale behind this test – which has been dubbed D3 – we first have to delve into the D-statistic, also known as the ABBA-BABA-test. This approach was developed to quantify the amount of genetic exchange between Neanderthals and modern humans. The rationale behind this test is quite straightforward: it considers ancestral (‘A’) and derived (‘B’) alleles across the genomes of four taxa. Under the scenario without introgression, two particular allelic patterns ‘ABBA’ and ‘BABA’ should occur equally frequent. An excess of either ABBA or BABA, resulting in a D-statistic that is significantly different from zero, is indicative of gene flow between two taxa. A positive D-statistic (i.e. an excess of ABBA) points to introgression between P2 and P3, whereas a negative D-statistic (i.e. an excess of BABA) points to introgression between P1 and P3.

A Z-score can be calculated to assess the significance of the D-statistic. I will not explain the mathematical underpinnings of the Z-score. All you need to know, is that a Z-score bigger than 3 or smaller than -3 can be interpreted as a significant result. Interested readers can check Durand et al. (2011) for more information.

The figure below illustrates the D-statistic with an example from my own work (see Ottenburghs et al. (2017) for more details). Comparing the genomes of four goose species reveals that Cackling Goose (Branta hutchinsii) and Canada Goose (B. canadensis) share more derived alleles than expected by chance. The resulting positive D-statistic suggests introgression between these species, which is not that surprising because there is a hybrid zone between these geese.

example_Dstat.jpg

The positive D-statistic indicates an excess of ABBA-patterns in the genomes of these geese, suggesting introgression between Cackling Goose (Branta hutchinsii) and Canada Goose (B. canadensis). Based on Ottenburghs et al. (2017) BMC Evolutionary Biology

 

Three-sample Test

One limitation of the D-statistic is that you need an outgroup to discriminate between ancestral and derived alleles. The method by Hahn and Hibbins circumvents this issue by focusing on branch lengths instead. Let’s see how this works. Consider a tree with three species: A, B and C. The correct arrangement of these species is shown below: A is more closely related to B than to C. In this case, there are two discordant arrangements: AC and BC. When there is no introgression, these discordance patterns should occur in equal frequencies. With introgression, however, we can expect other patterns. The authors explain that “introgression between B and C leads to both more trees with a BC topology and a shorter pairwise distance between these two lineages. As a result, dB–C [i.e. genetic distance between B and C] will be smaller than dA–C [i.e. genetic distance between A and C], leading to a negative value of D3. Conversely, gene flow between A and C leads to positive values of D3.”

figure_trees.jpg

Two discordance arrangements (BC and AC) are expected in equal frequencies when there is no gene flow. With introgression, one pattern becomes more common and results in a decreased genetic distance between some species. From: Hahn & Hibbins (2019) Molecular Biology and Evolution

 

Putting it into Practice

From this line of thinking, the researchers deduced a formula (see below) that is solely based on the genetic distances between the species and does not require an outgroup. I applied this new statistic to my goose data. I searched through my PhD-archive and found a table of genetic distances between the different goose species. Putting these numbers into the formula resulted in a D3 of -0.01 This outcome suggests gene flow between B and C, which corresponds to Cackling Goose and Canada Goose. This is in line with my findings based on the D-statistic (luckily…). Unfortunately, I could not test the significance of this result.

D3.jpg

 

Some Cautionary Notes

This new statistic seems promising for studies that could not sample an appropriate outgroup. However, one should not take this method at face value. A significant D3-statistic does not automatically mean that there has been introgression. Other evolutionary processes can influence this statistic (similar to the classic D-statistic). For example, population structure in the ancestor can produce deviations in the number of discordance topologies. Or introgression might come from unsampled or extinct species. Therefore, it is important to complement these statistics with other analyses to quantify introgression.

 

References

Durand, E. Y., Patterson, N., Reich, D., & Slatkin, M. (2011). Testing for ancient admixture between closely related populations. Molecular Biology and Evolution, 28(8), 2239-2252.

Hahn, M. W., & Hibbins, M. S. (2019). A three-sample test for introgression. Molecular Biology and Evolution.

Leafloor, J. O., Moore, J. A., & Scribner, K. T. (2013). A hybrid zone between Canada Geese (Branta canadensis) and Cackling Geese (B. hutchinsii). The Auk, 130(3), 487-500.

Mallet, J., Besansky, N., & Hahn, M. W. (2016). How reticulated are species? BioEssays, 38(2), 140-149.

Ottenburghs, J., Megens, H. J., Kraus, R. H., Van Hooft, P., van Wieren, S. E., Crooijmans, R. P., Ydenberg, R.C., Groenen, M.A.M. & Prins, H. H. T. (2017). A history of hybrids? Genomic patterns of introgression in the True Geese. BMC Evolutionary Biology, 17(1), 201.

Ottenburghs, J., Kraus, R. H., van Hooft, P., van Wieren, S. E., Ydenberg, R. C., & Prins, H. H. (2017). Avian introgression in the genomic era. Avian Research, 8(1), 30.

Taylor, S. A., & Larson, E. L. (2019). Insights from genomes into the evolutionary importance and prevalence of hybridization in nature. Nature Ecology & Evolution, 3(2), 170-177.

6 thoughts on “D-statistics for Dummies: A simple test for introgression

  1. […] Paola Pulido‐Santacruz and her colleagues collected genomic data for for 87 specimens, representing all six recognized species in this genus. First, they reconstructed the phylogenetic relationships between these taxa. The resulting evolutionary tree served as the backbone for a series of ABBA-BABA-tests. For readers unfamilar with this approach, I will copy the explanation from a previous blog post (D-statistics for Dummies). […]

  2. […] To understand how new White-eye species evolve on these islands, we need to know how isolated the different islands populations are from one another. Therefore, Joseph Manthey and his colleagues collected DNA samples across the Solomon Islands to reconstruct patterns of gene flow. Using the software TreeMix, the researchers were able to reconstruct the historical relationships between the island populations and pinpoint gene flow events (indicated with red arrows in the figure below). The findings from TreeMix were supported by other statistical tests, such as D-statistics. […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s