Haplotype construction for genomic evaluation purposes .

Get Complete Project Material File(s) Now! »

Pedigree-based selection methods

Pedigree-based selection methods assume that genetic relationships between animals are known and that phenotype data is available for a significant part of the population. The traits of interest are most often quantitative traits with a continuous (normal) distribution. These traits are assumed to be influenced by a very large (in theory by an infinite) number of loci, each having an (infinitesimally) small effect on the phenotype under study.
An individual’s phenotypic performance ( ) is influenced by multiple factors, including an additive genetic effect ( ), a dominance effect ( ), epistatic effects ( ) and environmental effects ( ): = + + + + (1).
where μ is the population mean. Other effects, such as genotype-environment interactions or maternal effects can be included as well, but are usually assumed to be negligible. and are also ignored, because they are not directly transmitted to the next generation.
Additive genetic effects “ ” (also called breeding values) are estimated using linear regression models. Best linear predictions (or BLP) of the breeding values are obtained by constructing optimal linear combinations of performances of each animal and close relatives (progeny, parents, sibs) expressed as deviation from a general mean. However, such procedures assume that breeding values do not differ systematically within any of the environmental effects, an assumption which usually does not hold in practical animal breeding. Therefore these estimates are usually biased.

Best linear unbiased prediction

Best linear unbiased prediction (BLUP) can be used to estimate the environmental effects and genetic effects simultaneously using mixed models. These models include the identifiable environmental effects as fixed effects and the breeding values as random effects. Since all effects are estimated at the same time and under the same assumptions, BLUP results in unbiased estimations for both types of effects. Using matrix notations, a statistical model including both types of explanatory variables can be written as: 0= + + (2).
where y is a vector of phenotypic observations (dimension: n × 1, where n is the number of phenotypes), b is a vector of fixed effects (dimension: p × 1, where p is the total number of levels of fixed effects), a is a vector of random additive genetic effects of all animals (dimension: q × 1, where q is the number of such “animal” effects), X is an incidence matrix of dimension n × p relating the levels of fixed effects to the observations, Z is an incidence matrix of dimension n × q relating the animal effects to the observations and e is a vector of random errors (dimension: n × 1).

Implementation in our study

The BLUP analyses were carried out using the BLUPF90 software (Misztal, 1999, after Misztal, 2016) and the results constituted a baseline for comparisons. On several occasions the performance of different genomic evaluation methods will be compared to those obtained with a pedigree-based BLUP model. The models used for breeding value estimation were the ones currently implemented for all dairy cattle breeds in France – including the regional breeds – for the traits we were interested in (discussed later).
Traits were analyzed in a single-trait context. Multiple-trait models also exist and they can result in higher accuracies when the genetic correlations between the analyzed traits are not zero. These methods assume knowledge on genetic correlations and are computationally more demanding than single-trait analyses (Lynch and Walsh, 1998). Because these genetic correlations were not always available and also because the French routine genomic evaluation is conducted in a single-breed context, multiple-trait models were not used and they will not be further discussed.

Genetic background of quantitative traits and genetic markers

Genomic selection procedures differ from pedigree-based selection methods in their use of genetic markers during the breeding value estimation process. In this section first a brief introduction is given on quantitative traits, which is followed by the presentation and characterization of the most frequently used markers and by the detailed description of the genomic evaluation procedures.

Quantitative trait loci

Quantitative trait loci (QTL) are the loci (e.g. genes, non-cooding RNA, etc.) affecting the expression of a quantitative trait. The ultimate aim of animal breeders is to identify through genomic evaluation all QTL as well as to accurately estimate the size of their effects. If such information would be available together with the genotypes of animals at all QTL, selection could be done purely on observed genotype data and phenotype recording would be dispensable. However, the identification of all QTL is currently not possible and therefore in nearly all cases breeders have to rely on genetic markers “linked” to the QTL.

Genetic markers

Genetic markers are DNA variations generated by mutations that occurred during the evolution of the species and of the breeds. We will see in section 2.4 that such DNA sequence information can be exploited for selection purposes in animal breeding: in genomic selection, genetic markers are used to trace the inheritance of chromosome segments carrying quantitative trait loci. Unless the QTL is/are known, these marker effects are used as proxies of the QTL effects. Since the exact locations of the QTL are unknown, denser marker maps increase the probability that at least one marker will be “linked” to each QTL. Several types of genetic markers are used for genomic evaluation purposes.

Microsatellite

Historically, the first markers used were microsatellites, which are defined as « simple sequence repeats with a repeat length of up to 13 bases » (Gibson and Muse, 2009). These markers have a high mutation rate and therefore are highly polymorphic with an average of at least 10 alleles per locus in human (Gibson and Muse, 2009). However, due to their sparse distribution along the genome, the observed gain in terms of accuracy of genomic evaluation was very limited (Boichard et al., 2012b, Guillaume et al. 2008a; Guillaume et al., 2008b) and genotyping costs of microsatellites were substantial.

Single nucleotide polymorphism

The key biotechnological breakthrough that led to significant improvements in selection accuracy (as compared to the pedigree-based selection methods) was the development of the first commercial SNP arrays (in cattle: Matukumalli et al., 2009). Single nucleotide polymorphisms (SNP) are mutations affecting a single locus on the genome. Due to the nature of these mutations, multi-allelic SNP are extraordinarily rare and the vast majority of them are bi-allelic. Furthermore, SNP are the most frequent type of markers on the genome and per-marker genotyping costs are constantly decreasing (e.g. Holland et al., 1991; Shen et al., 2005; Tobler et al., 2005).
In cattle, three main types of SNP-chips were developed: first the Bovine SNP50 BeadChip with approximately 54,000 SNP (50K; Illumina Inc., San Diego, CA, USA; Matukumalli et al., 2009) followed by the BovineHD BeadChip® with ~777,000 SNP (Illumina Inc., San Diego, CA, USA; Matukumalli et al., 2011 after Rincon et al., 2011) and finally the Illumina Infinium BovineLD Genotyping BeadChip hosting 3-18 thousand SNP, depending on the version of the SNP-chip (LD; Illumina Inc., San Diego, CA, USA). The bovine 50K chip was developed as an initial tool to allow both researchers and industry members to genotype a large number of animals and to enable them to evaluate the performance of the previously proposed genomic evaluation procedures (e.g. Meuwissen et al., 2001) on real data. The HD SNP-chip was developed to grant very fine mapping resolution to scientists, because it was envisioned that this would further improve the resolution and performance of QTL detections, genomic evaluations and other studies. Finally, the LD chip was specifically designed to include a relatively small number of SNP (~3-18 thousand) so the chip could be efficiently used to genotype a large number of animals at a low cost. The first LD SNP-chip contained only ~3,000 SNP and was specifically developed for the request of the United States Department of Agriculture by Illumina and to be used in the US Holstein population (SNP on the chip were selected accordingly). This chip was however quickly replaced by a larger one (~7,000 SNP), which was done for the request of the Bovine LD consortium (Boichard et al., 2012a). The chip then went through an evolution, during which the number of SNP increased to ~18,000; meanwhile several SNP were also replaced by others of larger importance. The larger versions of the LD SNP-chip were also more appropriate to be used in breeds other than the Holstein.
The development of these SNP arrays allowed breeding organizations in various countries in collaboration with research centers to genotype cost-effectively large numbers of SNP for thousands of individuals.
Genetic markers are said to be linked, when the co-occurrence of their different alleles is more frequent than it is expected from their allele frequencies under the assumption that the markers are segregating independently from each other. In other words, linkage is the non-random association between markers (Gibson and Muse, 2009). The stronger the linkage between a marker and a QTL is, the better the QTL effect can be “captured” with the marker alleles and therefore the more appropriate the marker is to trace the transmission of the QTL alleles from one generation to the other. Consequently, it is of interest to have genetic markers closely located to the QTL in order to be able to accurately estimate the marker effects. The strength of the linkage can be characterized by the level of linkage disequilibrium (LD). There are two commonly used measures of LD: D’ (the normalized) form of a linkage disequilibrium measure D and r2 (the square of a correlation coefficient between the frequencies of loci). Consider two biallelic markers SNP-A (with alleles A1 and A2).

READ Dynamics of HR-like and ACS microscopic changes in relation to O3 uptake

Haplotype

A notable disadvantage of SNP compared to microsatellites is that SNP are bi-allelic and therefore a single SNP carries less information than a single microsatellite. A possible solution to circumvent this issue is the use of combinations of SNP instead of individual SNP markers. Haplotypes can be defined in at least two different ways:
– haplotypes are the sets of alleles of markers or genes of an organism, which were inherited together by the individual on one of the ancestral chromosomes (e.g.: The International HapMap Consortium, 2005; Gibson and Muse, 2009; Stephens et al., 2001).
– More simply, haplotypes are combinations of N SNP markers (e.g.: Hayes et al., 2007; Villumsen et al., 2009; Garrick et al., 2014).
In this study, the term “haplotype” refers to the second definition, while the term “phase” will be used to cover the first definition. The term “alleles” or “haplotype alleles” will be used to refer to the alternative forms of the haplotypes (similarly to the case of SNP). Given this definition of a haplotype, it can be shown that a haplotype can carry a maximum of 2N different alleles, where N is the number of bi-allelic SNP forming the haplotype. Due to the multi-allelic nature of haplotypes, there is an increased chance – as compared to individual SNP – that at least one of these alleles will be in LD with the (ungenotyped) causative mutation at a QTL, if one is present. In addition, LD between haplotype and QTL alleles are more stable over time as well, because if a whole haplotype allele is passed to the next generation, it is very unlikely that two recombinations took place within the chromosome segment it represents.

Table of contents :

Chapter 1 Introduction
Chapter 2 Background
2.1 Characteristics of dairy cattle breeding
2.2 Pedigree-based selection methods
2.2.1 Best linear unbiased prediction
2.2.2 Implementation in our study
2.3 Genetic background of quantitative traits and genetic markers
2.3.1 Quantitative trait loci
2.3.2 Genetic markers
2.3.2.1. Microsatellite
2.3.2.2. Single nucleotide polymorphism
2.3.3 Haplotype
2.3.4 Imputation and phase reconstruction
2.4 Genomic evaluation
2.4.1 Marker-assisted BLUP
2.4.2 Genomic-BLUP
2.4.3 Bayesian methods
2.4.4 Genomic evaluation methods with haplotype markers
2.5 French routine genomic evaluation of dairy cattle
2.6 Consequences of genomic selection
2.6.1 Advantages of genomic selection
2.6.2 Drawbacks of genomic evaluation
2.7 Assessment of genomic evaluation studies
2.7.1 Principles of validation in genomic evaluation studies
2.7.2 Measured parameters
2.8 Analyzed breeds and traits
2.9 Single-breed and multi-breed genomic evaluation
2.9.1 Review of the recent multi-breed genomic evaluation studies
2.10 Problem statement and motivation
Chapter 3 Haplotype construction for genomic evaluation purposes .
3.1 The Montbéliarde dataset
3.2 Haplotypic BayesC-π results
3.3 Influence of allele frequency on genomic evaluation
3.3.1 Introduction
3.3.2 Alternative haplotype construction methods for genomic evaluation
3.3.3 Discussion
3.4 Genomic evaluation with HD data
3.5 Inclusion of linkage disequilibrium information
3.5.1 Introduction
3.5.2 Combining LD and allele frequency information to improve selection accuracy
3.5.3 Discussion
Chapter 4 Genomic evaluation in regional breeds
4.1 Datasets
4.1.1 Genotyping and imputation
4.2 LD-pattern in the regional breeds
4.3 Genomic evaluation with 50K data
4.3.1 Introduction
4.3.2 Single-breed and multi-breed genomic evaluation with 50K data
4.3.3 BayesC results
4.3.4 Discussion
4.4 Genomic evaluation with high-density data
4.4.1 Introduction
4.4.2 Materials and methods
4.4.3 Results
4.4.4 Conclusions
4.5 Genomic evaluation with causative mutations
4.5.1 Introduction
4.5.2 Materials and Methods
4.5.3 Results and discussion
4.5.4 Conclusions
Chapter 5 General discussion
5.1 Introduction
5.2 Biodiversity
5.3 Effects of the slower genetic progress
5.4 Perspectives for the regional breeds
5.5 Genomic evaluation in the regional breeds
5.6 Financial considerations
5.7 Genomic evaluation with haplotypes
5.8 Future perspectives
Chapter 6 Concluding remarks
References