Deciphering hybrid larch reaction norms using random regression

Get Complete Project Material File(s) Now! »

Genetics of plasticity

Bradshaw (1965) suggested plasticity to be under its own genetic control. This idea was latter defended by Scheiner (and Lyman 1989; 1993), among others, but Via (1993) proposed an alternative theory. This led to two distinct visions of the genetics of phenotypic plasticity, each one coming with its own evolutionary implication, leading more intuitively to some mechanics and analytical methods, as summarized in Table 1.
Via’s (1993) theory is that plasticity is a side-effect phenomenon, given by the fact that the alleles coding for the focal trait can also have the additional particularity of having effects that vary across environments. As a particular case, some alleles can express only in some environments and be shut down in others. Thus, the expression of one trait in two environments can be considered as two different traits, and the loci that affect these two traits are thus seen as pleiotropic. An across-environment additive correlation rA can be computed between them, this is the character-step approach. Across-environment additive correlations have exactly the same properties as across-traits additive correlations, and in the same way, they can be used to compute a correlated response to selection, that is the response to selection for the trait in environment 2 while the parents have been selected for this trait in environment 1 (Falconer and Mackay 1996). The character-state approach can be extended to the case where the environment is a continuous gradient. In this case, the across-environment genetic correlation becomes a continuous covariance function (Kirkpatrick and Heckman 1989).
On the other hand, Bradshaw (1965) and later Scheiner (and Lyman 1989; 1993) defended the view that some loci directly control the shape of the function linking the trait to the environment. Such a function is called a reaction norm, a concept that was first introduced by Baldwin (1902). Reaction norms can be modeled as polynomials, so that different parameters, each one under its own genetic control, set the mean of the trait, the slope along the environmental gradient, the curvature, and even more complex shape parameters. Schlichting and Pigliucci (1993) emphasize the importance of environmentally driven regulatory genes, i.e. genes that control the expression of other subsets of genes in an epistatic way. Unlike the pleiotropy theory, the epistatic theory allow natural selection to act directly on plasticity, for instance controlling the rapidity of responses, or their competence. The latter is illustrated at best by anticipatory plastic responses.
Both theories are now recognized as valid (Via et al. 1995), and are mutually compatible. Intuitively, we are inclined to associate pleiotropy to phenotypic modulation, and epistasis to developmental conversion. However, this shortcut may be misleading to some extent. Indeed, some pleiotropic loci may affect a latent trait, leading to a threshold response on another set of traits. On the other hand, phenotypic modulation may be offset, or supported by, regulatory genes. Finally, an important point is that the character-state and the reaction norm approaches are both mathematically equivalent (Meyer 1998; Windig et al. 2004). Therefore, both approaches can be used interchangeably to analyze a case of plastic response, without the need for priors on its mechanisms.

Genetic variance and covariance

In quantitative genetics, the value taken by a quantitative trait (or ‘phenotype’) P is considered to result from both a genetic and an environmental influence. This can be formalized as follows (Falconer and Mackay 1996): P = G + E (eq. 1).
where G is the genotypic value and E is the environmental deviation. At the population level, the average phenotype is equal to the average genotype, and the mean environmental deviation is assumed to be null. From this arises the decomposition of the variance: σ2P = σ2G + σ2E (eq. 2). where σ2 will refer to a variance, from now and subsequently. Thus, the phenotypic variance is the sum of the genetic and the environmental variances. Let’s consider an experimental design where a given number of genotypes are randomly field tested. The basic model is: yij = μ + ci + rij (mod. 1). where yij is the phenotypic value of the trait from individual of genotype i and repetition j, μ is the population mean, ci is the genotypic effect, and rij the residual. The two latter terms are considered as random samples of normal distributions (Henderson 1975), with variances σ2G and σ2E respectively. From this we can compute the broad-sense heritability H2 = σ2G / σ2P. Let’s go further in the decomposition of the phenotypic variance (eq. 2). Indeed, the genotypic value decomposes as follows (Falconer and Mackay 1996): G = A + D + I (eq. 3).
where A is the additive value, D is the dominance value, and I is the epistasis. These values can be interpreted mechanically as gene actions. The additive value A arises from additive allele effects, D arises from the interaction between alleles at the same locus, and I arises from all the other allelic interactions notably those between alleles at different loci. This molecular definitions imply a statistical meaning. The additive value A also called ‘breeding value’ is additively transmitted, meaning that offspring inherits, on average, the mean of their parents breeding values. At the individual level, each sib is characterized by a Mendelian sampling term that comes from the segregation of different maternal and paternal alleles, resulting in a breeding value that deviates from that of other sibs, and from the mean parental value. The dominance value D expresses in particular the statistical interaction between the parents, i.e. in the full-sib family effects. All the genetic effects that are neither A nor D are considered to constitute the epistasis value I. Just like eq. 1, eq. 3 leads to the variance decomposition: σ2P = σ2A + σ2D + σ2I + σ2E.

Environment and the variance decomposition

The variance decomposition as presented in eq. 2 assumes two hypothesis. First, it assumes that genotypes and environments are independent, so that there is no structure in the distribution of the genotypes within the environment. Second, that genotypes and environment are not interactive. In other words, this means that the phenotypic expression of genotypes is conditioned by the environment to which they are exposed in an additive way only, i.e. the reaction norms are parallel. Taking into account the deviations from both hypothesis, eq. 2 becomes (Falconer and Mackay 1996): σ2P = σ2G + σ2E + 2 cov(G, E) + σ2G×E.
where cov(G, E) is the covariance between genotype and environment, and σ2G×E is the variance due to genotype-by-environment interactions (G×E). In the wild, due to natural selection, we can expect the genotypes to be distributed according to the environmental constraints. However, this is not necessarily true in breeding experiments such as progeny tests, where genotypes are purposely randomly distributed in controlled trials, within given environments. For this reason, in practice, it is common to accept the hypothesis of independence between G and E and to neglect the cov(G, E) term. On the other hand, the question of G×E is important in breeding and central in this dissertation. Indeed, G×E is the statistical manifestation of genetic variance of plasticity (Primer 2004).
In order to illustrate the importance of G×E in quantitative genetics and especially in breeding, let’s consider two genotypes G1 and G2 whose phenotypes are expressed in two environments E1 and E2, as seen in Fig. 3. In this example, the ranking of the genotype on their ability to produce a high phenotypic value changes depending on the environment. The G×E interaction is therefore critical in this example, and it makes the breeder task trickier as none of the two genotypes is globally better than the other. Beside the effect of G×E on the relative performances between genotypes, there is also an effect of G×E on the variance components. Let’s consider that G1 and G2 are representative samples from a larger set of genotypes. To the extent of the representativity of G1 and G2, the genetic variance (σ2G) expressed in E1 is much lower than the genetic variance expressed in E2. This is symbolized on the left of Fig. 3 by the much narrower normal distribution of genotypic values in E1 (upper normal distribution) than in E2 (lower normal distribution). Beside the effect of G×E on the genetic variance, it is also intuitive that the environment may affect directly the environmental variance (σ2E). Therefore, all components of the heritability can be affected by the environment.
For simplicity sake, the term G×E is used whereas, often, additive-by-environment interactions are actually meant (this is the same commonly accepted mistake as using the term ‘genetic correlation’ when meaning rA). In this dissertation, G×E will always implicitly refer to additive-by-environment interactions.

READ Controlled Lorenz model of fluid convection

Increment cores data

Increment cores were collected in all three sites. In SA and in SS, all trees were sampled. This implied a special effort in SS to harvest the cores from trees before they were felled. In PC, only 1 tree per plot was randomly sampled (i.e. all ortets were represented by a single ramet). A summary of the age at sampling, and the size of the samples that were finally used are provided in the Appendix 1, Table 7 of Marchal et al. (2017).
On each diameter core, only the radius with the fewest defects was kept for further analysis. The half cores were sawn to 2 mm thick board, sanded, X-rayed to produce microdensitometric profiles that were treated with WinDENDRO (Regent Instruments Canada Inc. 2008). All this work was performed by the staff of the platform INRA-GENOBOIS. An example of microdensitometric profile is provided in Fig. 7. As seen on this illustrative sample, the succession of early and late wood allowed the reconstruction of a the tree ring chronology. A consequent part of my work consisted in a manual check of the chronologies in order to avoid possible mislabeling of the rings. For instance, due to the 2003 heat wave (Bréda et al. 2006; Rennenberg et al. 2006), the corresponding ring is very narrow and can be confounded with a false ring, that is, a peak in wood density occurring in the middle of the growing season and that does not delimit a new year (e.g. on Fig. 7, the 2004 ring presents 2 false rings).

Monte-Carlo Markov Chain

Philosophical considerations aside, our core motivation for using Bayesian statistics was the possibility to leverage Monte-Carlo Markov Chain (MCMC) algorithms, that are very convenient for several reasons. On the first hand, some MCMC algorithms are very robust solvers. We performed the multivariate analysis of up to 9 traits simultaneously, with highly structured variance-covariance matrices. I am sure that this analysis could have been done with some of the most efficient frequentist solvers as well, but it seems that, maybe due to their relative ease of implementation, performing MCMC software are blooming much faster than their maximum-of-likelihood, frequentist counterparts.
Another very important advantage of the MCMC algorithms, probably the most important, is to give access to the posterior distribution of each parameter, and therefore, to allow a great control of the uncertainty by means of credible intervals (the interval in which the parameter has e.g. 95% chance to be). Moreover, it is possible to build iteration by iteration the Markov Chain of derived parameters, such as correlations and heritabilities, and therefore to assess their uncertainty without the need for further assumptions about their parametric distribution.
The principle of Monte-Carlo is to describe the posterior distribution by sampling it. The construction of a Markov Chain is an efficient way to perform this random sampling. Several methods exist to build Markov Chains, Gibbs sampling being a famous and common one. Therefore at the end, the model is not solved with a point estimation of the parameter vector but with n samples of the parameter vectors instead. The first step is to ensure that the Markov Chain has well converged, as shown on Fig. 10.

Table of contents :

1. Introduction
1.1. Context
1.1.1. Hybridization as a breeding strategy in forestry
1.1.2. Larch species and hybrids
1.1.3. Hybrid larch breeding
1.1.4. The problem
1.2. Hybridization
1.2.1. Heterosis
1.2.2. Hybrids’ stability
1.3. Phenotypic plasticity
1.3.1. The concept
1.3.2. Genetics of plasticity
1.3.3. Dendroplasticity
1.4. Fundamentals of quantitative genetics
1.4.1. Genetic variance and covariance
1.4.2. Environment and the variance decomposition
1.4.3. Hybrids’ genetic variance
1.5. Scientific questions
1.5.1. Questions of Chapter 1
1.5.2. Questions of Chapter 2
2. Materials and methods
2.1. Experimental set-up
2.1.1. Mating design
2.1.2. Sites and set-up
2.2. Data collection
2.2.1. Field measurements
2.2.2. Increment cores data
2.2.3. Environmental data
2.3. Inference
2.3.1. Bayesian inference
2.3.2. Monte-Carlo Markov Chain
2.3.3. Priors
3. Chapter I. Hybrid larch heterosis: for which traits and under which genetic control?
3.1. Analytical considerations
3.1.1. The two-step approach
3.1.2. Account for the competition
3.1.3. Genetic effects
3.1.4. Modeling of ordinal categorical traits
3.2. Article
4. Chapter II. Deciphering hybrid larch reaction norms using random regression
4.1. Analytical considerations
4.1.1. Lessons from Chapter 1
4.1.2. Series autocorrelation
4.1.3. Selection of an environmental gradient
4.2. Article
5. Discussion
5.1. Main results
5.1.1. From Chapter 1
5.1.2. From Chapter 2
5.2. Towards the architecture of larch heterosis
5.2.1. Synthesis
5.2.2. Perspective: systemic approach
5.3. Implications for hybrid larch breeding
5.3.1. Synthesis
5.3.2. Perspective: molecular information
5.3.3. Perspective: in-depth hybridization
5.4. Phenotypic plasticity and hybrid larch breeding
5.4.1. Synthesis
5.4.2. Perspective: adaptability of the dendroplasticity
Appendices