Disentangling diversity patterns due to demography or natural selection in Pinus pinaster transcription factors involved in wood formation

Get Complete Project Material File(s) Now! »

SNP discovery

For SNP discovery two sets of sequences were considered. The first dataset comprised maritime pine sequences for 41 different genes involved in plant cell wall formation (candidate genes for wood quality) or drought stress resistance (Supplemental Table 1). For each fragment, an average of 50 megagametophytes (haploid tissue surrounding the embryo) from different populations were sequenced. The chromatograms were visually checked (nucleotides with phred scores below 30 were considered as missing data) and the SNPs were considered as true. Indeed, the use of megagametophytes lowered the risk of confusing polymorphism at a unique locus with diﬀerences between paralogous loci, as amplification of two or even more members of a gene family would have been easily detected by the visualization of double peaks in the chromatograms. This first set of SNPs will be referred as in vitro SNPs. The second sequence dataset consisted in a collection of 3,995 non-singleton contigs of the maritime pine unigene (G. Le Provost, unpublished, http://cbi.labri.fr/outils/SAM2/COMPLETE/ version pinaster_14_02_2007). We used the Polybayes software (MARTH et al. 1999) to detect SNPs with a high probability with the parameters described for maritime pine in LE DANTEC et al. (2004). This second set of SNPs will be referred as in silico SNPs.

SNP selection for array construction

We developed a Perl script, snp2illumina, for automatically extracting SNPs from multifasta sequence files and output them as a SequenceList file compatible with the Illumina Assay Design Tool software (ADT; http://www.illumina.com/downloads/Illumina_Assay_Design_Tool.pdf). This file contains the SNP names and surrounding sequences with polymorphic loci indicated by IUPAC codes for degenerated bases. The Perl script snp2illumina can work in batch mode and is available upon request from the corresponding author.
The functionality score provided by ADT is similar to a predicted probability of genotyping success, taking into account the sequence conformation around the SNP, the lack of repetitive elements in the surrounding sequence, and in the case of model species the sequence redundancy against the available sequence database (SHEN et al. 2005). In the case of maritime pine, no sequence database was available to test for sequence redundancy. All the SNPs presenting a functionality score below 0.4, which is considered as a lower limit for genotyping success by the manufacturer, were discarded.
Two contrasted strategies “depth vs. breath of SNP coverage” were adopted to select informative SNPs. In respect to in vitro SNPs, our objective was to include as many polymorphisms as possible for each gene fragment, therefore depth of coverage was preferred. For in silico SNPs, our goal was to include a low number of markers per unigene in a large number of unigenes, thus giving more emphasis to breath of coverage. The main technical constraint for selecting in vitro SNPs was that the selected polymorphisms should not be less than 60 nucleotides away from each other. When several SNPs stood within this limit it was decided to filter out lowest frequency variants and polymorphisms showing high level of linkage disequilibrium with other selected SNPs of the same fragment. Rare variants (minor allele frequency < 5%) were discarded. To select in silico SNPs we used the log-file of the snp2illumina script that records for each SNP the number of ESTs considered for the detection, the minor allele frequency and the PolyBayes score. To minimize the number of false positives we included in the assay only SNPs with a PolyBayes score above 99%, with either a minor allele appearing at least twice within four to ten ESTs, or a minor allele frequency above 20% when more than ten ESTs were available. Indeed, it is highly unlikely that sequencing errors of two independently sequenced ESTs occur at the same base location. We also excluded SNPs that were surrounded by other polymorphisms in the immediate 60 bases to avoid technical problems due to neighboring polymorphisms. In both cases, chromatograms were visually checked to ensure the quality of the flanking sequences, and we used BLASTn analysis (ALTSCHUL et al. 1990) to ensure that in vitro and in silico SNPs belonged to different genes.

SNP genotyping array

The Illumina GoldenGate technology (Illumina Inc., San Diego, CA, USA) was used to carry out the genotyping reactions in accordance with the manufacturer’s protocol (LIN et al. 2009). To assess the reproducibility of the genotyping assay, 19 DNA samples were duplicated across the different plates. Negative controls were also added to each 96-well plate. Highly multiplexed extension reactions were conducted using 250ng of template DNA per sample. The clustering was realized with the BeadStudio software (Illumina Inc.), and a quality score for each genotype was generated. A GenCall score cutoff of 0.25 was used to determine valid genotypes at each SNP and the SNPs retained had to get a minimum GenTrain score of 0.25, which represents a stringent criterion used in human genetic studies (www.illumina.com / FAN et al. 2003). GenCall and GenTrain scores measure the reliability of SNP detection based
on the distribution of genotypic classes (AA, AB and BB). Clusters were visually inspected to ensure high quality data (Figure 1). When we observed cluster compression (i.e. when the homozygous clusters normalized theta values were not in the [0, 0.1] or [0.9, 1] ranges, as illustrated in Figure 1 B, C and D), we considered that the genotyping failed, as this is likely due to genome redundancy (HYTEN et al. 2008). Indeed, the compression of the BB homozygous cluster towards the AA cluster could result from a paralog gene matching the A allele, increasing the signal for the A dye for both BB and AB genotypes. We also considered as genotyping failures monomorphic SNPs for which clusters could be divided in two or more subgroups like in Figure 1E.

READ Volatile organic compounds and host-plant specialization in European corn borer E and Z pheromone races

Measuring the error rate using pedigree data

We used the breeding population pedigree information (relationships between first and second generation) to detect possible Mendelian Inconsistencies (MIs) between parents and offspring using the PedCheck software (O’CONNELL and WEEKS 1998). Then, we used the method described in SAUNDERS et al. (2007) to estimate the genotyping error rate Π from MIs. Not all genotyping errors (GEs) are detectable as MIs, but there is a linear relationship between the GE and the MI counts has shown by HAO et al.

Table of contents :

Introduction
Disentangling diversity patterns due to demography or natural selection in Pinus pinaster transcription factors involved in wood formation
Introduction
Materials and Methods
Population sampling and DNA extraction
Candidate gene selection and sequencing
Sequence processing and polymorphic sites detection
Diversity, molecular differentiation and recombination rate estimates
Extent of linkage disequilibrium
Neutrality tests under the standard neutral model
Simulations of demographic scenarii
Results
Nucleotide diversity
Population differentiation
Recombination and LD
Neutrality testing
Assessment of alternative demographic models
Discussion
Low levels of nucleotide diversity in transcription factors
Power of neutrality tests
Impact of demographic history on diversity patterns in the Atlantic maritime pine population
Detection of selection signals in transcription factors?
Conclusion and perspectives
References
Supplementary Materials
Developing a SNP genotyping array for Pinus pinaster: comparison between in vitro and in silico detected SNPs
Introduction
Methods
Plant material
SNP discovery
SNP selection for array construction
SNP genotyping array
Measuring the error rate using pedigree data
Results
SNP detection and construction of the SNP array
Reproducibility and overall success rate of the SNP assay
SNP success rate according to a priori SNP functionality score
Comparison of allele frequency estimated by sequencing and genotyping
Measuring genotyping error rate with pedigree data
Discussion
Data summary
Conversion rates of in vitro- and in silico-SNPs for Pinus pinaster
Genotyping error rate
Conclusion and perspectives
References
Supplementary Materials
Genetic parameters of growth and wood chemical-properties in Pinus pinaster
Introduction
Material and Methods
Plant material
Data measurement
Statistical models for genetic parameter estimation
Results
Near infrared spectroscopy calibrations
Genetic parameters
Genetic correlations
Discussion
General considerations
Rapid wood-quality assessment techniques
Genetic effects and heritabilities
Perspectives for breeding applications
References
Association mapping for growth and wood chemical-properties in the Pinus pinaster Aquitaine breeding population
Introduction
Materials and Methods
Plant material
Phenotypic data
Genotypic data
Population structure
Statistical models
Multiple-testing corrections
Results
Population structure
Selection of markers for association tests
Statistical tests
Discussion
Population structure and familial relatedness
One-stage versus two-stage association mapping approaches
Power, allele frequency and sample size
Significant associations: which genes, which traits?
Conclusion and perspectives
References
General discussion and perspectives
Principal results obtained in this thesis
The candidate-gene approach in conifers
Power of association studies
Conclusion