Functional impact of deletions
We annotated all the deletions using Variant Effect Predictor (Ensembl 87). Around 71% (6,019 SVs) variants were intergenic and remaining 29% (2,461 SVs) overlapped genic elements, such as exons, introns, and untranslated regions (UTR). On average, high frequency gene disrupting deletions were somewhat depleted compared to intergenic variants (VAFintergenic > VAFgenic, p-value=0.04; one-sided Wilcoxon test). Furthermore, we observed many common genic deletions. These genes are relatively less conserved, and majority has multiple paralogs (discussed later). However, deletions on known essential genes were only observed as heterozygote with relatively low VAF (<3%), and generally were private to a specific breed. For example, FANCI deletions (cause brachyspina (Charlier et al., 2012)) were only observed in Holstein, and RNASEH2B deletions (cause embryonic lethality (Kadri et al., 2014)) in Nordic Red Cattle.
Selective constraints on genes overlapping deletions. The relative abundance of high frequency genic and intergenic variants indicate that majority of these intersected genes are non-essential, and thus did not affect the viability or fecundity of the carriers. To test this hypothesis, we analyzed the selective constraints between deleted genes (overlap of any genic element) and known mouse lethal genes (from Dickinson et al. (Dickinson et al., 2016)) in terms of dN/dS ratio of cow-mouse 1-to-1 orthologs (Figure 3.7). Here, high dN/dS values indicate low selective constraints on genes, and low value indicates high constraints. We found that genes in deletions have significantly higher dN/dS ratios than lethal genes (p-value 2.3×10-6; one-sided Wilcoxon test), and thus are evolutionarily less conserved. This is consistent with the rate of evolution seen in essential and non-essential genes – where mutations in essential genes were under strong purifying selection and thus evolved slowly (low dN/dS ratio), while non-essential genes were under relaxed selection, and hence, evolved faster (high dN/dS ratio) (Hurst and Smith, 1999). Nonetheless, robustness of these processes is also evident in the evolution of human essential genes. Interestingly, ~77% human essential genes could even be traced back to pre-metazoans (Blomen et al., 2015).
Nonessential genes in cattle. In total, we found 5,000 deletions for which at least one individual was homozygous. In the set, we analyzed homozygous deletions in genes to find natural gene knockouts. We found 167 deleted genes (transcript-ablation or complete deletion) corresponding to 115 independent deletions that are apparently nonessential based on the occurrences of live homozygote individuals. This is ~45% more than the previous report (Boussaha et al., 2015). Nonetheless, we found ~44% fewer genes compared to in humans (240 nonessential genes) (Sudmant et al., 2015), which could be due to the differences in sample size (175 vs 2,504 individuals) and study populations (3 vs 26 populations in human). Among these genes, ~83% (139 genes) are protein-coding, 12% pseudogenes, and the rest are different types of small RNAs (Table S6). Most of these genes belong to multigenic families and are not highly conserved (median cow-mouse dN/dS of 0.17 vs OMIA genes dN/dS of 0.11; Figure S6), as expected for homozygous deletion (Sudmant et al., 2015). Moreover, this set of genes are functionally enriched in immunoglobulin domains, olfactory receptors, and MHC classes (FDR = 2.06×10-22, 2.06×10-22, 7.01×10-6, respectively), along with other related domains (Tables S7-9). Similar functional enrichment of nonessential genes was also seen in humans (Sudmant et al., 2015). Olfactory receptor related genes are well known for extensive gains and losses in mammalian evolution (Niimura and Nei, 2007). And population specific copy-number variations of olfactory receptor genes were also reported in human (deletions) (Van Ziffle et al., 2011) and cattle (gains) (Lee et al., 2013). Nevertheless, this is the first report, to our knowledge, of homozygous deletion of olfactory receptor genes in cattle.
Figure 3.7. Difference between dN/dS ratios of mouse-lethal and deletion-overlapped genes in cattle. Cow genes for which one-to-one mouse orthologs available were considered for a one-sided Wilcoxon rank-sum test. Mouse lethal genes are from Dickinson et al. (2016).
QTL Enrichment. We next explored the enrichment (or depletion) of quantitative trait loci (QTL) on deleted regions (at least 1 bp overlap with deletion). We retrieved ~24K autosomal QTL from QTLdb reported to be associated with any of the six trait classes, e.g. “Health”, “Reproduction”, “Milk”, “Exterior”, “Production” and “Meat and Carcass”. The association of deletions with diseases, fitness or fertility related traits is well evident (Weischenfeldt et al., 2013). Hence, we suspected enrichment of fitness and fertility related traits for our deletions. As expected, health (2 fold) and reproduction (1.5 fold) related QTL were significantly enriched, while other trait classes were highly depleted (Table 3.2). Higher enrichment of health related QTL could be driven by immune-system genes, which were also highly enriched in our dataset (discussed earlier).
Deletion formation mechanisms
Finally we explored the probable mechanisms of deletion formation. There are two key mechanisms of structural variants formation (for detail see review (Hastings et al., 2009, Carvalho and Lupski, 2016)); for example, recurrent SVs often result from non-allelic homologous recombination (NAHR) between large low-copy repeats (LCRs), and thus, contain extensive sequence homology provided by LCRs, such as segmental duplicates, at the flanking regions (Carvalho and Lupski, 2016). In contrast, non-recurrent SVs often form either by microhomology-mediated end joining (MMEJ) or non-homologus end joining (NHEJ), which requires limited to no sequence homology, and thus could be characterized by microhomologies or simple blunt ends at the breakpoint junction (Hastings et al., 2009).
Breakpoint information is crucial for understanding the mechanism, and therefore, we analyzed 29 breakpoint resolved deletions from our validation set. We found that 24 of 29 deletions contain microhomology ranging from 2-31 bp at the breakpoint, and two of which also contain insertions (S2 Table). In addition, 4 deletions exhibited non-reference insertion at breakpoint junctions, and one deletion with no apparent homology. However, the number of breakpoint sequences analyzed here were not a robust representation of our deletion call-set (less than 0.5% deletions), though selected randomly (for validation), we were able to demonstrate that majority of deletions contain microhomoloy at breakpoint, followed by few insertions, and rarely with no homology. Our results largely agree with the trend reported for large deletions in humans, e.g. 70.8% deletions exhibited microhomology/homology and 16.1% insertions at the breakpoint (Mills et al., 2011).
This study only focused on identifying deletions in cattle because of their potential relevance to loss-of-function and embryonic lethality. However, we had limited success to identify small deletions, such as <200 bp due to reduced sensitivity of the SV caller. It is also not a comprehensive list of deletions for these samples, since we could have missed many true deletions due to sensitivity, coverage, or stringent filtering (among other reasons). Furthermore, the short read length (~100 bp) in our WGS dataset also made it difficult to resolve breakpoints from regions of long repeats.
Loss-of-function variants are responsible for a substantial yearly-economic loss in dairy industry, where a limited number of elite sires are in extensive use for rapid genetic gains. Mapping of such variants is essential for effective breeding planning and genomic selection. Here we showed an NGS-based analytical framework suitable for population-scale mapping of large deletions in cattle, leveraging the available WGSs. Here we described population-genetic, functional, and evolutionary properties of discovered deletions. We identified and confirmed a ~525 KB deletion on chromosome 23, causing stillbirth in Nordic Red Cattle. We demonstrated that Nordic Red Cattle had higher population diversity than Holstein and Jersey, and deletion-genotype could recapitulate genetic structure of these breeds. Natural gene knockouts are enriched for immune-related and olfactory receptor genes. We also showed that deletions are significantly enriched for health and fertility related QTL, while depleted for production related QTL. Our population genetic and functional analysis showed promise for inclusion of SVs in genomic studies in dairy cattle. This deletion catalog will facilitate discovery, genotyping, and imputation of deletions in large cohorts of animals, and subsequent studies for gene mapping and genomic prediction of breeding values.
Table of contents :
Chapter 1. General Introduction
1.1 Large-scale sequencing and genotyping
1.2 Genetic Markers
1.3 Homozygous Haplotype Deficiency (HHD)
1.4 Genotype Imputation
1.5 Genome-wide association study (GWAS)
1.6 Genomic Prediction
1.7 Aim and objectives of this PhD study
Chapter 2. A missense mutation (p.Tyr452Cys) in the CAD gene compromises reproductive success in French Normande cattle
2.4 Results and Discussion
Chapter 3. Genome-wide mapping of large deletions and their population-genetic properties in dairy cattle
3.3 Materials and methods
3.4 Results and discussion
Chapter 4. Joint imputation of whole-genome sequence variants and large chromosomal deletions in cattle
4.4 Results and discussion
Chapter 5. Genome-wide association study with imputed whole-genome sequence variants including large deletions for female fertility in three Nordic dairy breeds
5.4 Results and Discussion
Chapter 6. Genomic prediction for female fertility using imputed whole-genome sequence variants including large chromosomal deletions
6.4 Results and Discussion
Chapter 7. General Discussion
7.1 Recessive Lethals
7.2 Use of Whole-Genome Sequence Variants
7.3 Genomic Prediction
7.4 Evolutionary Conservation as a Tool for Identifying Lethal Genes