Construction and Characterization of Duck Whole Genome Radiation Hybrid Panel

Get Complete Project Material File(s) Now! »

Single Molecule Real Time Sequencing (Pacific Bioscience)

Single Molecule Real Time (SMRT) sequencing was licensed by Pacific Bioscience Inc. in 2009 and is reported as true single molecule sequencing in real time. The principle of this technology relies on single molecule real time sequencing by synthesis on a zero-mode waveguide (ZMW)-containing SMRT cell (Figure I-18). Differently from the ion semiconductor sequencing and Helicos true single molecule sequencing, SMRT sequencing achieves sequencing in real time and allows long sequencing reads which can be up to 10,000 bases long (Eid et al. 2009). SMRT sequencing technology has the advantages of shortening the time for obtaining results, of avoiding PCR amplification of the template and allows for long read length. Those advantages are achieved by two principle components: ZMW and fluorescence-labeled phospholinked nucleotides (Korlach et al. 2010).
The ZMW nanostructures consist of dense arrays of holes which are approximately 100nm in diameter, fabricated in a 100nm metal film deposited on a transparent substrate (Foquet et al. 2008; Levene et al. 2003). Each ZMW becomes a nanophotonic visualizable reaction chamber for observing a single nucleotide incorporation event, providing a reaction volume of ~100 zeptoliters (10-21 L). As the diameter of the ZMW is of three orders of magnitude smaller than the wavelength of fluorescence, the intensity of fluorescence from the free nucleotides in the reagent decreases dramatically when observed from the bottom of the reaction chamber by diffraction-limited confocal microscopy. The small size of ZMW prevents visible laser light which comes beneath the transparent substrate and has a wavelength of 600nm from passing entirely through the ZMW. Rather than passing through, the light exponentially decays as it enters in the ZMW, and only the bottom 30nm of the ZMW becomes illuminated. In addition, the DNA polymerase is immobilized at the surface of the ZMW by streptavidin and biotin interaction. Therefore, it is possible to observe single nucleotide incorporations undergoing at the bottom of the reaction chamber or ZMW. Thereafter the fluorescent signal from each single chamber is transmitted and collected by the optical systems beneath the ZMW.
In addition to reducing the number of labeled nucleotides present inside the observation volume, the highly confined volume results in drastically shorter diffusional visitation times. This enables better temporal differentiation between events involving diffusion of labeled nucleotides through the ZMW which typically lasts for a few microseconds and incorporation events which lasts for several milliseconds, therefore, the diffusion events can be easily distinguished (Korlach et al. 2010).
ZMW only resolves the difficulties of observing single molecules during sequencing. The higher speed in sequencing reaction is achieved by the use of dye-labeled terminal phosphate-linked nucleotides. Several of the sequencing by synthesis schemes utilize nucleotides with fluorescent dyes linked to the nucleobases, but their enzymatic incorporation becomes increasingly limited with large fractions of labeled dNTP replacements. Current solutions for most sequencing technologies are adapting stepwise additions of base-labeled nucleotides, followed by chemical or photochemical removal of the label, resulting in reduced sequencing speeds as additional washing and cleavage steps have to be performed (Ju et al. 2006; Korlach et al. 2008; Mitra et al. 2003).
In SMRT sequencing, an alternative approach is applied that attaches the fluorescence label onto the phosphate chain instead of the base. In this case, as the DNA polymerase induces the cleavage of the α-β-phosphoryl bond in dNTP during DNA synthesis, a pyrophosphate with the attached fluorescent label is released, leaving a natural unmodified nucleotide in the newly synthesized DNA strand. Linking a fluorescent dye directly onto the phosphate in dNTP introduces steric hindrance as a potential cause of DNA polymerase inhibition; however, an extension of the triphosphate moiety to four or five phosphates was reported to increase incorporation efficiency (Kumar et al. 2005). The form of the labeled nucleotides used in SMRT sequencing is that fluorescent dye is conjugated to an aliphatic linker that separates the nucleotide and the fluorophore thus allowing larger spatial separation, and then built onto pyrophosphate moiety. By using terminal phosphate-labeled nucleotides, the “cleavage and washing” scenario is avoided and therefore realizes sequencing in real time and shortens time to result dramatically. The overview of SMRT sequencing is shown in Figure I-18.
Unlike NGS and the other two third generation sequencing platforms, SMRT sequencing is capable to read up to 10,000 bases with an average of 1,000 bases long reads. High processivity is achieved by using Φ29 DNA polymerase which is also capable of strand displacement DNA synthesis, enabling the use of double strand DNA as template. Φ29 DNA polymerase has also been currently widely used in whole genome amplification approaches (Dean et al. 2002; Silander and Saarela 2008). A wild type of Φ29 DNA polymerase was modified to have improved performances in sequencing. The mutant has reduced 3’-5’ exonuclease activity but maintains the identically polymerization properties as the wild type (Korlach et al. 2008).
The SMRT sequencing platform provides three read types: (1) standard sequencing in which a long inserts library is made so that DNA polymerase can synthesize along a single strand; (2) circular consensus sequencing (CCS) in which insert size is short and double strand template is ligated to a pair of hairpin-like adapters so that both the forward and reverse strand can be read for several times each (Figure I-18); (3) strobe sequencing in which requires very long insert size, the laser light in the instrument is alternated between on and off during sequencing step so that on-periods generate the sequencing reads and off-periods determine the length of the space in between.
Fluorescence pulses in SMRT sequencing are not only characterized by their emission spectra but also by their duration and by the interval between successive pulses, from which two parameters are obtained: pulse width (PW) and interpulse duration(IPD), reflecting the kinetics of the polymerase while the sequencing is in process. PW is a function of all kinetics steps after nucleotide binding and up to fluorophore release, whereas IPD is determined by the kinetics of nucleotide binding and polymerase translocation. Eid et al also reported that the IPD was strongly affected by the DNA template whereas the PW was governed by local chemical processes in the active site so that PW showed only moderate variability with sequence context (Eid et al. 2009). A SMRT cell contains approximately 75000 ZMW in which about one third contain a single DNA polymerase with optimized loading. The DNA synthesis rate is about 2~4 bases per seconds and therefore a single SMRT sequencing run takes only a few hours. The current error rate of 15 % is significantly higher than with other sequencing techniques, which a proeminence of deletions, followed by insertions rates. The deletions probably stem from incorporation events or intervals that are too short to be reliably detected while the insertions may be caused by dissociation of a cognate nucleotide from the active site before phosphodiester bond formation resulting in the duplication of a pulse. Although the current error rate is high, the erroneous position happens stochastically during sequencing. So the error rate can be diminished by CCS in which both strands are read several times. In an approach followed by Travers and his colleagues (Travers et al. 2010), first a double strand template, with both ends ligated with a hairpin-like adaptor was used to constructed the library called SMRTbell, thus sequencing by CCS read type as described above. With an insert length of 250bp, theoretically, an expected phred-style quality value could reach 30 which is sufficient for SNP detection. The accuracy is positively related with the sequencing depth, it is reported that with 15-fold average coverage, the median accuracy can achieve 99.3% (Eid et al. 2009).
SMRT sequencing has a fascinating utility in detecting DNA methylation (Flusberg et al. 2010) and damaged DNA bases (Clark et al. 2011). Both studies are based on the principle that the kinetics of DNA polymerase is influenced by DNA sequence context. Compared with bisulfite conversion combined with massively parallel sequencing, SMRT sequencing provides opportunities for the direct detection of single DNA molecule methylations without bisulfite conversion which simplifies the sample preparation and reduces the complexity in post-sequencing analysis. Furthermore, different modifications such as N6-methyladenosine, 5-methylcytosine and 5-hydroxymethylcytosine influence the kinetics of DNA polymerase in different patterns, the assignment and classification of the modifications can therefore be inferred from the metrics of PW and IPD. The discrimination between cytosine, 5-methylcytosine and 5-hydroxymethylcytosine cannot be accomplished with bisulfite sequencing. The Pacific Bioscience company is still refining this technique to make de novo methylation profiling become possible.

READ Success of adapting herd environments to improved genotypes

De novo assembly for TGS

The TGS technologies described above devote many efforts to reducing the sequencing biases caused by PCR amplification to generate template clusters, to produce long sequencing reads, to shorten the run times and to reduce the instrument cost by avoiding optical system in base identification. But in the library preparation step, all the TGS technologies still use the in vitro library preparation strategies as for NGS (or second generation sequencing) so that the size of the inserts is still limited to 20kb which still makes the large eukaryotic genome difficult assemble into ultracontig or superscaffold. The final solution may still need mapping-based strategies to order and assign the scaffolds onto chromosomes.

Avian Genome Structure

It is believed that avian species could have existed at least since the late Triassic period which is about 200 million years ago since discovery of two nearly complete fossil skeletons of Protoavis which pre-date the Jurassic Archaeopteryx by some 50 million years. Mitochondrial analysis suggested that the common ancestor of birds and mammals diverged 310 million years ago while the common ancestor of birds and crocodilians diverged 210-250 million year ago (Burt et al. 1999; Griffin et al. 2007; Muller and Reisz 2005). The evolutionary relationships among major avian groups are contentious although well studied (Chojnowski et al. 2008; Ericson et al. 2006). But there are two nodes at the base of the avian tree that are supported by both morphological and molecular phylogenetic studies (Chubb 2004; Groth and Barrowclough 1999; Hackett et al. 2008). The first divides into the Paleognathae (ratites and tinamous) and Neognathae (all other birds), and the second splits the neognaths between the Galloanserae (Galliformes and Anseriformes) and Neoaves (other neognaths). According to the data from Timetree website (http://www.timetree.org/), the mean divergence between Galliformes and Anseriformes is about 81.2 million years. Although many bird species have diverged tens of millions years or even longer, avian species possess highly conserved karyotype and synteny (Nanda et al. 2011; Shibusawa et al. 2004).
Most avian species contain about 40 pair of chromosomes except some notable extremes like the stone curlew and kingfisher, with 20 and 66 pairs of chromosomes, respectively (Burt 2002). Of 40 pairs chromosomes, seven or eight pairs are the largest chomosomes, the macrochromosome which are 3µm ~ 6µm in length; the remainings are 0.5µm ~2.5µm in length and named as microchromosomes (Rodionov 1996). Interestingly, in Accipitridae, the total number of chromosomes is about 70 but they only have 3 to 5 pairs of michromosomes (Bed’Hom et al. 2003). The organization of their karyotype is really different than the classical bird karyotype. In birds, the nomenclature of sexual chromosomes is different from mammals which are named as Z and W rather than X and Y. In contrast to mammals, the females are heterogametic in which karyotype is ZW and males are homogametic whose karyotype is ZZ in birds. Moreover, comparative genomics showed that ZW chromosomes are not syntenic to mammalian XY but mostly syntenic to HSA5 and HSA9 (Fridolfsson et al. 1998; Nanda et al. 1999; Stiglec et al. 2007).

Table of contents :

Chapter I. General Introduction
1. General information on ducks
1.1 Taxonomy & Domestication
1.2 Natural habitat and habits
1.3 Duck breeding
1.3.1 Duck breeding in China
1.3.2 Duck breeding in France
1.4 A scientific model for avian influenza study
1.5 The rationale for duck genomics
2. Genome mapping and sequencing
2.1 Genetic markers
2.2 Cytogenetic, BAC contig and genetic maps
2.3 Genome maps using somatic cell radiation hybrids: a history
2.3.1 Radiation hybrid map
2.3.2 History
2.4 Radiation Hybrid (RH) mapping
2.4.1 Principle
2.4.2 Published RH panels and maps
2.4.3 Radiation hybrids are unstable
2.4.4 Whole genome amplification as an alternative approach to avoid large scale culture
2.5 Genome sequencing
2.5.1 The Sanger sequencing method
2.5.2 Strategies for whole genome sequencing of large genomes
2.5.3 Next Generation Sequencing or parallel sequencing
2.5.4 Comparison and Conclusion
2.5.5 Consequences of the NGS on genome assembly strategies
2.5.6 Third generation sequencing
2.5.7 De novo assembly for TGS
3. Avian Genome Structure
3.1 Sex Chromosome
3.1.1 Evolution of sex chromosomes
3.1.2 Dosage compensation
3.2 Sequenced Avian Genomes
3.2.1 Chicken Genome
3.2.2 Zebra Finch genome
3.2.3 Turkey genome
3.3 Avian comparative Genomics
4. Current status of duck genomics
4.1 Duck genetic map
4.2 BAC library & Fosmid library
4.3 SNP Detection
4.4 EST data
4.5 Duck genome sequencing
4.6 Ultrascaffold construction strategy for NGS: duck as an example
Chapter II. Construction and Characterization of Duck Whole Genome Radiation Hybrid Panel
1. Introduction
2. Results and discussion
2.1 Comparison of two methods for duck embryonic fibroblast culture
2.2 Generation of duck radiation hybrids
2.3 Comparative results
2.4 The optimized method
2.5 Cytogenetic investigations on four hybrids
2.6 Discussion
3. Conclusion
4. Supplementary Method
ChapterIII. Testing the Duck RH panel with Different Genotyping Techniques
Introduction
Article
Discussion
Chapter IV. Genotyping by Sequencing: whole genome RH maps
Introduction
Article in preparation
Complementary results and discussion
A highly repeated gene in duck genome: ATG4A
Sequencing whole genome amplified (WGA) hybrids
Chapter V. General Discussion and perspectives
Whole genome RH maps
Avian chromosome evolution
The highly repeated gene: ATG4A
Additional chromosomes in hybrids
Unraveling the smallest microchromosomes by Fluidigm Biomark qPCR
Apply RH sequencing on other species
References