MATERIALS AND METHODS
For this study, all datasets were provided by Danish Pig Research Centre. Three populations were analyzed simultaneously: Danish Landrace (LL), Danish Yorkshire (YY) and two-way crossbred Danish Landrace-Yorkshire. Crossbred animals that had Landrace sire and Yorkshire dam were termed ‘Landrace_Yorkshire (LY)’, while ‘Yorkshire_Landrace (YL)’ represented crossbreds with Yorkshire sires and Landrace dams. The TNB data in this study comprised the records of the first parity in all the three populations. Totally, TNB was recorded in 293,339 LL, 180,112 YY and 10,974 crossbred animals. This dataset is termed “full population” throughout the whole paper.
Among the crossbreds, 7,407 were LY and 3,567 were YL. All of the purebred animals had first farrowing dates between 2003 and 2013, while the crossbred animals first farrowed between 2010 and 2013. The pedigree for both purebred and crossbred animals was available and all the crossbreds were traced back to their purebred ancestors until 1994 by the DMU Trace program (Madsen, 2012). Consequently, 332,929 LL, 210,554 YY and 10,974 crossbreds were in the pedigree. Among those animals, 7,723 LL and 7,785 YY were genotyped with Illumina PorcineSNP60 Genotyping BeadChip (Ramos et al., 2009). Two thirds of purebred genotyped animals were boars. For the crossbreds, 5,203 animals (4,077 LY and 1,126 YL) were genotyped with a 8.5K GGP-Porcine Low Density Illumina Bead SNP Chip (GeneSeek, 2012). SNP quality controls were applied on the same dataset in a previous study (Xiang et al., 2015), where more details can be found. Finally, 41,009 SNPs and 7,916 SNPs in autosome chromosomes were accessible in purebreds and crossbreds, respectively. Imputation was implemented in crossbreds from 7,916 SNPs to 41,009 SNPs with software Beagle (Browning, 2008), which outputs phased SNPs for both reference and imputed population, by using a joint reference panel of the two pure breeds (Xiang et al., 2015). As a result, phased 41,009 genotyped SNPs were available for the genotyped animals in both purebreds and crossbreds for the current study.
Single-step BLUP model for purebred and crossbred performances
The new single-step BLUP method of evaluating both purebred and crossbred performance was developed by Christensen et al. (2014). The model reformulates the “full” Wei and van der Werf (1994) A1 model and incorporates genomic information by using two breed-specific combined relationship matrices, which extend the marker-based relationship matrices to the non-genotyped animals.
where , and contain phenotypes for purebred LL, purebred YY and F1 crossbred animals, respectively; , and represent fixed effects; , and were overall random residual effects, assumed to be independently normally distributed with mean 0 and variance σ2 , σ2 and σ2e , respectively; and contain breeding values for breed LL and breed YY for their purebred performance (mating within each own breed), stands for the additive genetic effects of F1 crossbred animals, and , and are the respective incidence matrices. Note that the animal additive genetic effects are actually formed as the sum of two additive gametic effects, one from LL and another from YY. In other words, a crossbred diploid genome decomposes into two purebred haploid genomes.
The Christensen et al. (2014) method, first, assumes that effects of markers across the different origins (Yorkshire and Landrace, in this case) are unrelated. Under this assumption, the additive effect of the genome of an F1 crossbred animal can be split into the sum of two additive gametic effects, one gamete from each breed, where the two gametic effects are uncorrelated by assumption of the model. Therefore, separate matrices of pedigree-based or genomic-based relationships can be set up within each breed, and then be combined according to purebred theory for the single-step (Legarra et al., 2009; Christensen and Lund, 2010). The analysis proceeds by estimating solutions to two different breed-specific random effects. The key to disentangle the breeds of origin for the genetic effect of the F1 individuals is the ability to construct pedigree-based partial relationship matrices (García-Cortés and Toro, 2006) or separate (by origin) genomic matrices, which in turn requires ascertainment of breed origin of the marker genotypes. More specifically, there are three steps:
Step 1). Reformulate the Wei and van der Werf model by splitting additive genetic effects for crossbred animals (LY) into breed of origin specific genetic effects, i.e, split the additive genetic value of the i-th F1 crossbred in two additive genetic values, one from each origin (LL or YY): = + . It has to be understood that neither of these is a breeding value strictu sensu, instead, they are additive effects in the statistical sense as “regression of value on gene dosage” as explained by Falconer et al. (1985), who clarifies the various definitions of average effect of genes in absence of random mating. Note that the new single-step model (Christensen et al. 2014) is not the animal model used by Lo et al. (1997) and Lutaaya et al. (2001). Actually, the new single-step model is a reformulation of the full model from Wei and van der Werf (1994, equation A1), whereas Lo et al. (1997) and Lutaaya et al. (2001) refer to the reduced animal model from Wei and van der Werf (1994, equation A2). In presence of pedigree information only, the full and the reduced animal model are equivalent, but in presence of crossbred genomic information this is no longer the case. In the papers of Lo et al. (1997) and Lutaaya et al. (2001), the additive genetic value of the i-th F1 crossbred is = ( ( , ) + ) + ( + ). Here ( , )and ( , ) are half the additive genetic values of the purebred parents ( , ) and ( , ), which are common to all the offspring of the same sire or dam, and and are the respective Mendelian samplings, which are different for each offspring. In the reduced animal model, both Mendelian sampling terms are included in the residual effect of the crossbred animals, and only ( , ) and ( , ) are estimated. This is for two reasons: first, with pedigree information only, this term cannot be estimated; second, setting up matrices of additive relationships (and their inverse) for crossbred animals at the animal model is not straightforward (Lo et al. 1993; García-Cortés and Toro, 2006). Therefore, in the works of Lo et al. (1997) and Lutaaya et al. (2001), the additive genetic value of the i-th F1 crossbred is replaced by ( , ) + ( , ). With genomic relationships and in the model of Christensen et al. (2014), these Mendelian sampling terms are embedded into a genomic relationship matrix (relationships across animals for purebreds and gametes for crossbreds) and they are no longer uncorrelated. Thus, the absorption of this term into the residual error term is not suitable. In the current study, = ( , ) + and = ( , ) + . Additive genetic value of the i-th F1 crossbred is not identical to ( , ) + ( , ) in Lo et al. (1997) and Lutaaya et al. (2001). Thus, our model (which is a gametic model at the level of crossbreds) is not a single-step model equivalent of Lo et al. (1997) and Lutaaya et al. (2001), which, at the level of crossbreds, are reduced animal models. Step 2). Construct breed-specific partial relationship matrices for each breed of origin genetic effects. Considering pedigree relationships, the variance and covariance between additive genetic purebred ( ) and crossbred ( ) effects of breed LL is described as This is a two-trait representation. For better understanding, the genetic effects can be split into animal effects belonging to purebred animals ( , ) and gametic effects belonging to crossbred animals ( ( ), ( )):
= [ 2 = [ 2 L,L (L) ],
, ] ⨂ ( ) , ]⨂[ L,LY
2 2 (L) (L)
, , LY,L LY,LY
where matrix (L) is a matrix of partial relationships which contains four blocks, one for within purebred animals ( L,L), two for purebred with crossbred animals ( L,LY(L)) and vice versa ( LY,L(L)), and one for within crossbred animals LY,LY(L). If there are pure Landrace animals and crossbred animals the size of ( ) is ( + ) × ( + ). The purebred animals have additive effects, which are breeding values, (when mated within breed) and L (when mated to the other breed). The purebred gametes of crossbred animals have additive effects LY(L) (within the cross itself). The covariance structure includes, for ease of representation, ( ), which are effects of crossbred gametes in purebred performance; these effects are merely conceptual but they simplify the representation and computation. The covariance structure for breed YY is similar:
= [ 2 = [ 2 Y,Y (Y) ]
, ] ⨂ ( ) , ]⨂[ Y,LY
2 2 (Y) (Y)
, , LY,Y LY,LY
with size of ( )equal to ( + ) × ( + ), and both structures are assumed independent, i.e., there is no covariance between LL effects and YY effects. As in Wei and Van der Werf (1994), there are six genetic (co)variance components, three for each breed.
Matrix (L) can be constructed based on available information (pedigree, markers) as follows. The pedigree-based and marker-based breed LL partial relationship matrices are (L)=[ (L)
(L)L,L (L)L,LY ] and (L) =
[ L,L L,LY(L) ], respectively, where the partition divides purebred animals from purebred gametes in
crossbred animal. Because of the split into breed-specific gametes, the pedigree-based partial relationship matrices (L) and (Y) must be computed as in García-Cortés and Toro (2006).
Construction of the breed-specific marker-based relationship matrices assumes that the breed of origin of phased alleles in crossbred animals is known. In other words, it is known which phased allele in a crossbred animal LY is from breed LL and which one is from breed YY. Then, the marker-based partial relationship matrix contains cross-products of centered genotypes:
L,L = ( L – 2 L ′)( L – 2 L ′)′
L,LY(L) = ( L– 2 L ′)( LY– L ′)′
LY,LY(L) = ( LY– L ′)( LY– L ′)′
where mL and qLY contain breed-specific allele contents of the second allele for purebred LL (coded as 0, 1, 2) and crossbred animals (coded as 0, 1), respectively; vector pL are breed LL specific allele frequencies based on marker genotypes for purebred and crossbred animals.
Later, matrix (L) is adjusted to be compatible with (L): a(L) = (L)β + α, where = [ /2 /2 /4], and J denotes a matrix of ones partitioned as (L). Scalars α and β are estimated through solving the two following equations:
( ) ( )
A 22 = G β+ Kα,
( ) ( )
dA 22 = dG β + dK α, e.g., equating the averages of the full matrices and equating the averages of the diagonals of pedigree and genomic relationships for genotyped individuals (Christensen et al., 2012). Matrix ( )22 contains pedigree relationships for genotyped LL individuals. Procedure is identical for breed YY.
Step 3). Combine the pedigree-based and adjusted marker-based partial relationship matrices to a combined partial relationship matrix (L), which is similar to matrix used in single-step method for purebred animals (Legarra et al., 2009; Christensen and Lund, 2010). The inverse of (L) is
( (L))−1 −1 −1] + ( (L))−1,
= [ (L) ) (L)
( ω − ( 22 )
where 1 ω(L) = (1 − ω) a(L) + ω (L)22. Parameter ω is the relative weight on the residual polygenic effect. Many other studies have investigated the weighting factors between the pedigree-based and marker-based relationship matrices (Christensen and Lund, 2010; Christensen et al., 2012; Gao et al. 2012; Su et al., 2012; Guo et al., 2015) and commonly they put forward that the weighting factors should be determined by the specific trait and the dataset analyzed. We investigated weighting factors from 0.1 to 0.5. Preliminary analysis (results not shown) for different weighing factors showed that ω = 0.4 was appropriate, in terms of balance between predictive abilities and biases for crossbred animals. Procedure is identical for breed YY. The sparse inverse partial relationship matrices ( (L))−1and ( (Y))−1are used as input to solve the mixed model equations of the model.
This is a three observed trait model (performance in LL, YY and F1) but with two genetic effects (LL and YY), each with two genetic traits: purebred and crossbred performance. Estimation of genetic parameters by REML and BLUP predictions were done using the DMU software (Madsen and Jensen, 2013).
Crossbred allele tracing
Software Beagle, which was used to impute and phase genotypes in crossbred animals, does not give breed allele origins as an output. Thus, to infer the allele origins in crossbred animals, we proceeded as follows. The allele tracing was processed separately on each chromosome per individual.
Among the 5,203 genotyped crossbred animals, sires of 4,520 crossbreds were genotyped, while neither parent of the other 683 crossbreds was genotyped. When the sire was genotyped, total differences between the two sets of phased imputed alleles of a crossed animal and two sets of phased alleles of its corresponding purebred sire were compared. Comparisons between crossbred and purebred phased alleles were made on each SNP along the chromosome. For a specific comparison, if a crossbred allele was different from the corresponding purebred allele, that SNP was counted as one difference. Along the chromosome, if the sum of differences between one set of crossbred phased alleles and one set of specific purebred phased alleles was lowest among the four comparisons, then this set of specific crossbred phased alleles was considered as originating from the breed of the sire. Logically, the other set of crossbred phased alleles was assigned to the other breed.
When neither parent was genotyped, one of the two sets of phased imputed crossbred alleles was studied segment-by-segment. Each crossbred phased chromosome was split into several small segments, which consisted of 50 consecutive SNP markers. These were compared with the corresponding collection of segments from phased chromosomes of two purebred reference populations LL and YY, which were used for imputing crossbred genotypes. Each small segment in the crossbred animals should exactly match at least one segment in the reference panel since each crossbred segment was imputed by the purebred reference population. Copies of that specific segment being detected in the reference population of LL and YY were counted separately and were divided by total number of segments in the same position in the reference panel of LL and YY to get proportions of matched segment. If the proportion was higher in one breed, the crossbred segment was considered to originate from this breed. Throughout all the segments within a crossbred phased chromosome, if the vast majority of segments were considered as originating from one specific breed, then the crossbred phased chromosome was assigned to that breed. Consequently, 5,203 crossbred phased alleles were traced to either breed LL or YY.
Table of contents :
List of Abbreviations
Chapter 1: General Introduction
Chapter 2: Paper I
Chapter 3: Paper II
Chapter 4: Paper III
Chapter 5: Paper IV
Chapter 6: General Discussion
Chapter 7: Conclusions
Dissemination of Knowledge