A three-sample test to detect introgression.
Introgressive hybridization seems to be a common phenomenon. The advent of genomic data has revealed the exchange of genetic material between numerous species (see for example Mallet et al. (2016) and Taylor & Larson (2019) for recent reviews). In concert with the explosive expansion in genomic resources, scientists have developed several statistical tests to detect introgression. I have provided an overview of these methods in my Avian Research paper: “Avian Introgression in the Genomic Era”. However, new methods keep popping up and a recent addition to the introgression-toolbox is particularly interesting: in the journal Molecular Biology and Evolution, Matthew Hahn and Mark Hibbins introduce a three-sample test for introgression.
To understand the rationale behind this test – which has been dubbed D3 – we first have to delve into the D-statistic, also known as the ABBA-BABA-test. This approach was developed to quantify the amount of genetic exchange between Neanderthals and modern humans. The rationale behind this test is quite straightforward: it considers ancestral (‘A’) and derived (‘B’) alleles across the genomes of four taxa. Under the scenario without introgression, two particular allelic patterns ‘ABBA’ and ‘BABA’ should occur equally frequent. An excess of either ABBA or BABA, resulting in a D-statistic that is significantly different from zero, is indicative of gene flow between two taxa. A positive D-statistic (i.e. an excess of ABBA) points to introgression between P2 and P3, whereas a negative D-statistic (i.e. an excess of BABA) points to introgression between P1 and P3.
A Z-score can be calculated to assess the significance of the D-statistic. I will not explain the mathematical underpinnings of the Z-score. All you need to know, is that a Z-score bigger than 3 or smaller than -3 can be interpreted as a significant result. Interested readers can check Durand et al. (2011) for more information.
The figure below illustrates the D-statistic with an example from my own work (see Ottenburghs et al. (2017) for more details). Comparing the genomes of four goose species reveals that Cackling Goose (Branta hutchinsii) and Canada Goose (B. canadensis) share more derived alleles than expected by chance. The resulting positive D-statistic suggests introgression between these species, which is not that surprising because there is a hybrid zone between these geese.
One limitation of the D-statistic is that you need an outgroup to discriminate between ancestral and derived alleles. The method by Hahn and Hibbins circumvents this issue by focusing on branch lengths instead. Let’s see how this works. Consider a tree with three species: A, B and C. The correct arrangement of these species is shown below: A is more closely related to B than to C. In this case, there are two discordant arrangements: AC and BC. When there is no introgression, these discordance patterns should occur in equal frequencies. With introgression, however, we can expect other patterns. The authors explain that “introgression between B and C leads to both more trees with a BC topology and a shorter pairwise distance between these two lineages. As a result, dB–C [i.e. genetic distance between B and C] will be smaller than dA–C [i.e. genetic distance between A and C], leading to a negative value of D3. Conversely, gene flow between A and C leads to positive values of D3.”
Putting it into Practice
From this line of thinking, the researchers deduced a formula (see below) that is solely based on the genetic distances between the species and does not require an outgroup. I applied this new statistic to my goose data. I searched through my PhD-archive and found a table of genetic distances between the different goose species. Putting these numbers into the formula resulted in a D3 of -0.01 This outcome suggests gene flow between B and C, which corresponds to Cackling Goose and Canada Goose. This is in line with my findings based on the D-statistic (luckily…). Unfortunately, I could not test the significance of this result.
Some Cautionary Notes
This new statistic seems promising for studies that could not sample an appropriate outgroup. However, one should not take this method at face value. A significant D3-statistic does not automatically mean that there has been introgression. Other evolutionary processes can influence this statistic (similar to the classic D-statistic). For example, population structure in the ancestor can produce deviations in the number of discordance topologies. Or introgression might come from unsampled or extinct species. Therefore, it is important to complement these statistics with other analyses to quantify introgression.
Durand, E. Y., Patterson, N., Reich, D., & Slatkin, M. (2011). Testing for ancient admixture between closely related populations. Molecular Biology and Evolution, 28(8), 2239-2252.
Hahn, M. W., & Hibbins, M. S. (2019). A three-sample test for introgression. Molecular Biology and Evolution.
Leafloor, J. O., Moore, J. A., & Scribner, K. T. (2013). A hybrid zone between Canada Geese (Branta canadensis) and Cackling Geese (B. hutchinsii). The Auk, 130(3), 487-500.
Mallet, J., Besansky, N., & Hahn, M. W. (2016). How reticulated are species? BioEssays, 38(2), 140-149.
Ottenburghs, J., Megens, H. J., Kraus, R. H., Van Hooft, P., van Wieren, S. E., Crooijmans, R. P., Ydenberg, R.C., Groenen, M.A.M. & Prins, H. H. T. (2017). A history of hybrids? Genomic patterns of introgression in the True Geese. BMC Evolutionary Biology, 17(1), 201.
Ottenburghs, J., Kraus, R. H., van Hooft, P., van Wieren, S. E., Ydenberg, R. C., & Prins, H. H. (2017). Avian introgression in the genomic era. Avian Research, 8(1), 30.
Taylor, S. A., & Larson, E. L. (2019). Insights from genomes into the evolutionary importance and prevalence of hybridization in nature. Nature Ecology & Evolution, 3(2), 170-177.