SNP discovery in indigenous Afrikaner, Drakensberger and Nguni cattle breeds of South Africa

Get Complete Project Material File(s) Now! »

Sample collection, library construction and DNA sequencing

Sampling of blood and hair was performed with the approval of the Animal Ethics Committee of the University of Pretoria (EC: S4285-15), according to guidelines for the proper handling of animals during sample collection. Genomic DNA was extracted from whole blood (200 μl/sample) using the Roche DNA extraction Kit (Roche, Germany) following the standard protocol of the manufacturer. The procedure included a proteinase K digestion followed by column purification for the extraction of high quality DNA. The extraction of DNA from hair roots was performed using an optimized Phenol-Chloroform protocol (Sambrook & Russell, 2006), that included a Proteinase K and Dithiothreitol digestion followed by phenol-chloroform extraction and centrifugal dialysis with Centricon concentrators (Slikas et al., 2000). The quality of the extracted DNA samples was assessed using a Nanodrop UV/Vis Spectrophotometer (Nanodrop ND-1000) and verified using a Qubit® 2.0 Fluorometer (Thermo Scientific). All DNA samples were maintained at a concentration of 50 ng/μl in preparation for NGS sequencing at the ARC Biotechnology Platform.
Equimolar DNA pools were prepared for each breed using 170 ng of DNA per animal, and each DNA pool contained 30 animals per breed. Genomic libraries were prepared with the Paired-end Sequencing Sample Preparation Kit (Illumina, San Diego, CA) using 5 μg of genomic DNA according to the manufacturer’s instructions. DNA was fragmented using a Covaris M220 sonicator, end-repaired and A-tailed followed by the ligation of adapters (Nextera Transposase, Illumina) and 12 cycles of polymerase chain reaction (PCR) were performed. The average fragment size for each library was 350 bp. Quantities and the quality of usable material for each of the libraries were estimated by qPCR (KAPA Library Quantification Kit–Illumina Genome Analyzer-SYBR Fast Universal). The automated cBot Cluster Generation System (Illumina, San Diego, Calif, USA) was used to generate clusters on the flow cell. Each DNA pool was then sequenced (paired-end; read length 125 bp) in a single lane of a flow cell using the Illumina HiSeq 2000 to a target of 30X coverage. The resulting images were analyzed with the HiSeq Pipeline Software v2.0 (Illumina) to generate the raw fastq files (Van Tassell et al., 2008; Ramos et al., 2009; Van et al., 2013; Boutet et al., 2016). Sequence reads were filtered for base quality using Trimmomatic (Bolger et al., 2014). Reads were trimmed if four consecutive bases had an average Phred-like quality score of less than 20. PCR duplicates were removed using Picard (Li et al., 2009) since these should not be counted as evidence for or against putative variants or for allele frequency estimation (Auwera, 2013).
Pairs of DNA sequences for which each read exceeded 35 bp were retained for analysis.
Sequence reads were aligned to the Bos taurus reference genome (UMD3.1) using the Burrows-Wheeler aligner (BWA), a software package for mapping lowly-divergent sequences against a large reference genome (Li et al., 2009). The alignments were sorted and converted to the BAM format using SAMtools v1.2 (Ramirez-Gonzalez et al., 2012). Data were then formatted for variant calling using Picard tools, by marking duplicate reads (Li et al., 2009) which were ignored by the Genome Analysis Tool Kit (GATK) during variant calling.

READ  A Sub-system to Map Natural-language Utterances to Situated Parameterized Dialog Acts 

Variant discovery, annotation and functional enrichment analysis

Variant discovery was performed within breed according to GATK Best Practices using the genomic variant call format (GVCF) workflow (Auwera, 2013). The workflow includes data pre-processing steps and calling variants separately for each population using a command that is specific for paired-end data. The pre-processing steps include realigner target creator to generate intervals for each chromosome for Indel realignment, depth of coverage estimation for each chromosome, base recalibration, analyzing covariates/variables and printing reads. Genotype calling was performed separately for each chromosome to generate GVCF files for variant calling. The workflow included a joint analysis step that empowers variant discovery by providing the ability to leverage population-wide information from a cohort of samples, allowing the detection of variants with greater sensitivity and genotyping samples as accurately as possible (GATK Best Practices; Bareke et al., 2013). Cohorts of variants were generated in VCF files, and the genotypes were called for each breed with a minimum genotype quality of 20, and a read depth of between 1 and 25 (Aslam et al., 2012). To reduce the false discovery rate, hard filtering steps were conducted using the following criteria: Phred scaled polymorphism probability (QUAL) < 30.0, variant confidence normalized by depth (QD) < 2.0, mapping quality (MQ) < 40.0, strand bias (FS) > 60.0, HaplotypeScore > 13.0, MQRankSum < −12.5, and ReadPosRank-Sum < −8.0 (GATK Best Practices; Choi et al., 2015). All SNPs that passed these criteria were consequently categorized into fixed (homozygous non-reference assembly nucleotide genotypes called in all individuals within the breed) or segregating (variable/heterozygous genotypes identified in the breed) (Aslam et al., 2012).
Minor allele frequencies were estimated for each SNP by directly counting the number of reads representing each allele using PLINK (Purcell et al., 2007; Ramos et al., 2009). Ratios of fixed to segregating SNPs were estimated within each of the populations using PLINK. The transition-to-transversion (Ti/Tv) ratio for each SNP call was calculated for each population as an indicator of potential sequencing errors (Choi et al., 2015) using VCFtools (Danecek et al., 2011). This is the ratio of the number of transitions (interchanges of either purines, A<->G or pyrimidines, C<->T) to the number of transversions (interchanges of purine for pyrimidine bases), for a pair of DNA sequences (Mitchell, 2015).

CHAPTER ONE
Introduction
1.2. Aim of the study
1.3. Objectives
CHAPTER TWO
Literature Review
2.1. Introduction
2.2. Indigenous cattle in Africa
2.3. Genetic variation in cattle
2.4. Use of DNA technology in genomic selection
2.5. Discovery of SNP markers
2.6. DNA sequencing methods
2.7. Variant detection and the use of SNP assays
2.8. Conclusion
CHAPTER THREE
Genome-wide identification of breed-informative single-nucleotide polymorphisms in three South African indigenous cattle breeds
-Abstract
-Introduction
-Materials and Methods
-Results
-Discussion
-Conclusion
-Acknowledgements
-Authors’ Contributions
-Conflict of Interest Declaration
CHAPTER FOUR
SNP discovery in indigenous Afrikaner, Drakensberger and Nguni cattle breeds of South Africa
-Abstract
-Introduction
-Materials and Methods
-Results and Discussion
-Conclusion
CHAPTER FIVE
Identification of selective sweeps and breed-specific SNPs in Afrikaner, Drakensberger and Nguni cattle using genome-wide sequence data
-Abstract
-Introduction
-Materials and Methods
-Results
-Discussion & Conclussion
CHAPTER SIX
Critical Discussion
-Conclusion
Recommendations

GET THE COMPLETE PROJECT

Related Posts