Cytonuclear disequilibrium and nuclear admixture analysis
In the combined mito-nuclear analysis we tested for the association between mitochondrial and nuclear lineages. We used a z-test to compare admixture proportions among mitochondrial groups and a one-sided student test to compare q-values among populations with and without indication of recent translocations (cf. Results section).
Translocation frequency, intensity and source areas
We compared observed translocation frequencies with theoretical expectations from the Poisson law and tested if there was a difference with a χ2-test. Existence of regional differences in translocation frequency was tested by comparing proportions of purebred and admixed individuals found in the Alpine and the Central European region using a z-test. To test if translocations often involved multiple sources, we counted the number of different nuclear cluster per population and compared results among populations with and without indication for translocation using a Wilcoxon rank-sum test.
Systematic detection of first-generation migrants
To detect cases of recent translocations separately in each population, we applied a Bayesian approach implemented in GENECLASS (Piry et al. 2004). It allows identifying first-generation migrants in a population and provides likelihoods of belonging to a given reference population. With this approach, recent translocations, especially those across very divergent populations, are assimilated to first generation migrants. If admixture has taken place, the model assumptions are not longer fulfilled and introduced genotypes should no longer be identified. We relied on 10,000 simulated individuals (Paetkau et al. 2004) and a type I error of 0.01. We ran GENECLASS three times with the same parameters. After each run, we excluded the detected migrant individuals.
To investigate the composition of introduced material, we calculated mean q-values of each cluster for different groups of introduced individuals corresponding to different combinations of source and release areas. These were compared to mean q-values of presumed native considered to result from translocation or associated admixture (identified using either STRUCTURE or GENECLASS, cf. Results section). The remaining individuals were mapped and summary statistics including classical measures such as allelic richness (NA), gene diversity (HS), inbreeding coefficient (FIS) and fixation index (FST) were calculated and compared with the original using FSTAT version 22.214.171.124 (Goudet 2001) with 15,000 permutations for nuclear data and using PERMUT (http://www.pierroton.inra.fr/genetics/labo/Software/) for mtDNA data (Pons & Petit 1996).
A total of 22 mtDNA haplotypes were detected in 363 Larix decidua samples from 43 populations, 10 of which resulted from simple sequence variation and 12 from minisatellite variation (Fig. 5A). The eight L. sibirica samples had a single private haplotype (haplotype 23) that was however not very divergent from those found in L. decidua. The L. decidua haplotypes formed two main groups, each of which was composed of two frequent haplotypes and of some less frequent ones. In group 1 (haplotypes 16, 17, 18 and 22), haplotypes 16 and 18 were found in 58% of the individuals. In group 2 (haplotypes 8, 9, 10, 19, 20, 21 and minisatellite variants), haplotypes 9 and 10 occurred in 28% of the individuals. Haplotypes of group 1 were mostly found in the Alpine region and those of group 2 in Central Europe (Fig. 5B). However, there were 37 individuals (10%) with group 1 haplotypes found in Central Europe and 14 individuals (4%) with group 2 haplotypes found in the Alpine region. These individuals were scattered across the range and typically interspersed in populations of the regionally common haplotypes. Two populations were composed solely of regionally uncommon haplotypes: populations 73 and 79, in Central Europe. Both were located outside the acknowledged specie range. The strong geographic structure of mtDNA separating the Alpine and the Central European region and the sporadic occurrences of haplotypes not fitting with this pattern raise the question of whether this is caused by ancient gene flow or by recent translocations. To study this question, we analysed high resolution nuclear data and compared them with the mtDNA data.
Figure 5 Minimum spanning network (A) and distribution of the combined mitochondrial haplotypes (atpA and UBC460) before (B) and after (C) systematic translocation removal. In (A) circles represent haplotypes Figure 5 Minimum spanning network and distribution of combined mt haplotypes labelled by their codes and scaled to their frequencies. The purple pie chart summarizes haplotypes caused by minisatellite variation. Unlabelled circles symbolize predicted haplotypes that were not observed. Branches correspond to single mutations regardless of their length. In (B) and (C) circles represent the haplotype composition of populations (~8 individuals/population). Labels are population codes.
Figure 6 (A) Neighbour joining tree and distribution of the seven SSR clusters detected by STRUCTURE (B) before and (C) after systematic translocation removal. In (A) each rectangle represents a cluster. In (B) and (C) pie charts represent cluster composition of the populations (~24 individuals/population). Individuals with q values >0.5 are coloured, the remaining ones are white.
Figure 6 Neighbour joining tree and distribution of the seven SSR clusters Cluster analyses of 1026 L. decidua individuals from 45 populations using STRUCTURE and calculation of second order rate of change gave strongest support for the existence of three ancestral groups (K = 3, Fig. S1, Table S3, Supporting information). However, some statistical support and clear individual assignments were also found for K = 7 (Fig. 6A, Fig. S2, Supporting information). The seven clusters were mainly distributed mainly in an east-west direction (Fig. 6B). Yet, in some populations, there were individuals that had high assignment probability to clusters more frequent in other regions. For instance, in population 18, there were three individuals assigned to clusters 5 and 7, whereas all other individuals where assigned to cluster 2. Furthermore, three Central European populations (73, 78 and 79) were entirely composed of individuals assigned to clusters 2 and 3, which were more frequent in the Alpine region. For subsequent analyses, we combined the seven nuclear clusters into two larger groups of related clusters (group 1: cluster 1-4, group 2: cluster 5-7; Fig. S2, Supporting information).
Evaluation of assignment accuracy
To evaluate the robustness of the assignments, we simulated genotypes using HYBRIDLAB, considering two scenarios. One was based on panmixia within each of the seven identified cluster. In the other scenario, we simulated panmixia within the two cluster groups. To derive allelic frequencies for each cluster or cluster group, genotypes from the original sample were selected according to their q-values in the STRUCTURE analysis. We selected individuals with q-values of ≥ 0.75 for the respective cluster and ≥ 0.875 for the respective cluster group (Alps versus Central Europe), the latter being the sum of the q-values of all individual clusters making up the respective cluster group. A threshold value of 0.875 corresponds to the optimal theoretical assignment threshold to distinguish backcross from purebred individuals, whereas the relaxed threshold of ≥ 0.75 used for each of the seven individual clusters corresponds to the optimal threshold to distinguish F1 from purebreds. In the second scenario, individuals were selected only based on q-values ≥ 0.875 for cluster groups, i.e. it included all individuals admixed among clusters belonging to the same cluster group. Simulations were used to quantify the frequency with which genotypes produced under each scenario were falsely assigned to the alternative purebred category (q ≥ 0.875) or to the admixed category (0.125 ≤ q < 0.875) (Table S4, Supporting information).
We found that, under both scenarios, not a single individual was assigned to the alternative purebred category, pointing to a very low risk to misassign purebred category. In contrast, some purebred individuals were falsely assigned to the admixed category. Their proportion depended on the cluster and on the scenario but it remained limited. Genotypes from group 1 were falsely assigned to the admixed category in 1.3-4.6% of the cases in the first scenario and in 13.3% in the second scenario. For group 2, values ranged between 5.4% and 11.1% in the first scenario and reached 12.3% in the second. Comparison of the 263 original admixed genotypes included in the STRUCTURE analysis and in the subsequent run with the simulated genotypes revealed that there was no important change in the q-values between the two analyses (mean difference of 0.034), indicating that the addition of simulated genotypes did not modify clustering criteria, as expected given our settings in the STRUCTURE analysis (USEPOPINFO model).
We checked for association between the two nuclear cluster groups and the two mtDNA lineages using 363 individuals for which both type of information was available. All nuclear genotypes assigned to group 1 purebreds carried group 1 mtDNA haplotypes, and all but one group 2 purebred carried group 2 mtDNA haplotypes, indicating nearly total cytonuclear association (Table 5, bold numbers first two lines). We also analysed the same association within each region. In both cases, regionally uncommon nuclear purebreds were associated with regionally uncommon mtDNA lineages (Table 5, bold numbers lines 4 and 5), which we interpret as evidence for recent establishment of genotypes originating from the other region. We then focused on admixed individuals in both regions. We reasoned that if introgression was caused by translocations, there should be a higher proportion of introgressed individuals characterized by the regionally uncommon mtDNA lineage (i.e. presumably introduced) than by the regionally dominant lineage. In the Alpine region we found 14 individuals with group 2 haplotypes. Seven of them (50%; Table 5, underlined) were introgressed (0.125 ≤ q < 0.875). This contrasts with the corresponding proportion in individuals with group 1 haplotypes (13%, z = -3.68, p < 10-3). In Central Europe, results were similar. A total of 37 individuals had group 1 haplotypes and 16 of them (43%, Table 5, underlined) were introgressed, whereas the proportion of introgressed individuals with group 2 haplotypes was significantly lower (24%, z = -2.26, p = 0.03). Altogether, mtDNA and nuclear data cross-supported each other, suggesting that in both regions, uncommon nuclear and mitochondrial lineages resulted from translocations. We decided to use only the nuclear criteria for the detection of translocations and of admixture events, as nuclear data was available indistinctly for all genotypes, in contrast to mtDNA. Using the nuclear criterion for purebreds, we detected 89 (8.6%) exotic genotypes (i.e. purebreds belonging to the cluster group that is regionally uncommon), of which 23 were found interspersed in populations with common genotypes and 66 in the three populations predominantly composed of exotic genotypes (Fig. 7, black coloured proportions).
Table of contents :
CHAPTER 1: General introduction and acknowledgements
Structure of the thesis
Scientific contributions and acknowledgements
CHAPTER 2: Description of the species and review of existing markers
Review of existing genetic markers
Levels of differentiation
Traceability systems in use and future needs
CHAPTER 3: Two highly informative dinucleotide SSR multiplexes for European larch
Materials and methods
Results and Discussion
Conclusions and perspectives
CHAPTER 4: Translocation genetics of European larch
Material and methods