Get Complete Project Material File(s) Now! »

## Juveniles and adults Allis shad

Juveniles Allis shad (n = 44) were collected between June and October 2013 in four French rivers (the Blavet, the Vilaine, the Loire and the Dordogne rivers) and between September and January from 2009 to 2012 in the Minho River (Table 1). Juveniles were collected using seines and bongo nets in the upper-estuarine region of rivers during the downstream migration.

Adults Allis shad (n = 615) were sampled on 15 rivers from upstream spawning sites to tidal freshwater parts of the watercourse between April and June from 2001 to 2014 (Table1). This period matches with the upstream spawning migration of Allis shad. The collected fishes were caught by fishermen, with a trammel net or by sport fishing. Some fishes were found dead after spawning periods. Fishes were measured near the millimeter when possible and then frozen.

Table 1 Number of adults and juveniles Allis shad sampled in each river per year and their mean fork length (mm) ± standard deviation. The number of water sampling in 2012 and 2013 is also specified for each river.

Note that the water baseline contains 3 rivers without fish samples (the Charente, the Oloron and the Nive rivers). The Adour E. corresponds to the estuary and consequently fishes couldn’t be reallocated into this site since Allis shad is known to reproduce in the middle watercourse of rivers.

**Samples preparation and microchemistry analysis**

Samples were prepared before the start of this work. Protocols are detailed in Martin et al. (2015). Water samples were analyzed to measure elemental concentrations using a solution-based-sensitive Inductively Coupled Plasma Mass Spectrometer (ICP-MS). The isotopic ratio (87Sr/86Sr) analysis was performed using a Nu-Plasma Multi-Collector Inductively Coupled Plasma Mass Spectrometer (MC-ICP-MS) following the protocol described by Martin et al. (2013).

Since the otolith grow from birth to death, the natal signature of the river experienced by a fish during the juvenile stage corresponds to the portion near the core of the otolith. In order to target this particular stage, Martin et al. (2015) performed a C-shaped ablation trajectory. The ablation diameter corresponds to the time during which juveniles experience freshwater. The chemical signature of the core of the otolith reflects the composition of the marine water experienced by the female before the upstream migration (Volk et al 2000). In order to exclude this maternal effect on the core signature, the ablation was performed 40µm away from the core. A first semi corona was ablated by a laser to ICP-MS for elemental concentrations analysis and a second semi corona was ablated by a laser to MC-ICP-MS for isotopic ratio analysis (Figure A.1). The width of the two semi coronas was 60 µm, so that the external part of the ablation was placed 100µm from the primordium (Figure A.1). All the elemental concentrations were above the limit of detection (LOD). Since the two coronas correspond to the juvenile stage, the combined use of 87Sr/86Sr, Sr/Ca and Ba/Ca defined as a multi-dimensional space allowing the characterization of the natal origin of adults Allis shad.

**Preliminary analysis of water and juvenile baselines**

A preliminary analysis of the discriminant capacity of the baselines was performed before examining the outputs of the models. This analysis is necessary to ensure that water and juvenile signatures do not overlap and allow precise reallocation. The variability of the 3-dimensional water signatures was first tested considering a ‘river’ effect on the isotopic and the elemental concentrations using a non-parametric Wilcoxon test with Bonferroni adjustment. In parallel, a Canonical Discriminant Analysis (CDA) was performed on the water and the juveniles’ otolith microchemistry data sets in order to check the discriminant capacity of the isotopic ratio and the elemental concentrations between rivers and juveniles of the baseline. This analysis was performed using the ade4 package of R software (R Development Core Team, R.3.1.1, 2014). The temporal stability of the juvenile and the water baselines were not tested because of a lack of temporal variability in the sampling dates. Therefore, in this study, the temporal stability of the baselines’ signatures was supposed to be checked.

**Construction of Bayesian hierarchical models**

In the following subsections, ad and jv correspond respectively to the adult and juvenile stages. In the first model, natal rivers were denoting by N and are included in [1,kb]. This range corresponds to the number of rivers of the water baseline. In the second and the third models, N is included in [1,K], with K the total number of sources which can be superior to kb. Brackets {} denote vectors and braces [] represent matrices. More details about the structure of the models are available in Appendix II.

### Bayesian model with fixed number of sources and multiple baselines

The otolith composition could be seen as the result of the integration of the water elements and a partitioning due to three interfaces, the gills, the cellular transport and the crystallization in the otolith (Bath et al. 2000). As Bath et al. (1999) found a linear relation between water and otolith concentrations in Ba and Sr, a linear regression was performed between the water concentrations in Ba and Sr in the rivers where juveniles were sampled (i.e. the Blavet, the Vilaine, the Loire, the Dordogne and the Minho rivers) and the otolith concentrations of juveniles. The regressions were significant for Sr/Ca (F=1269; df=3; p-value =4.865e-05) and the Ba/Ca (F=18.06; df=3; p-value =0.02388) with a high degree of adjustment between the water and the otolith concentrations (respectively R²=0.998 and R²=0.858 for Sr/Ca and Ba/Ca). Therefore, because the Sr/Ca and Ba/Ca ratios in the otolith are deposited in proportion to their ratios in water, a linear relationship was assumed between water and otolith composition in the Bayesian model. Such a linear regression was not required for the isotopic ratio since it is not submitted to a partitioning (Blum et al. 2000; Kennedy et al. 2002).

The water and the otolith composition of the adults and juveniles were preliminary centered and scaled for each element. The water isotopic ratio was centered and scale using the mean and variance of adults’ otolith isotopic ratio to conserve equality between isotopic ratios in water and otolith after transformation. This transformation was performed to decrease the correlation between regression parameters. The scaling was also useful to provide a single scale of variations among the elements and the isotopic ratio.

The otolith composition of an adult ad was considered to follow a multinormal distribution (MN). The expectation {Ō(r)} (i.e. the average composition of the otolith) was defined by a linear relation linking the water composition of a river r with the partitioning coefficients a and b: ({Oto(ad)} | N(ad) = r) ~ MN({a} . {Water(r)} + {b}, [∑]) (1) where Oto(ad) and Water(r) correspond respectively to the otolith composition of an adult ad and the water composition of a river r. N(ad) represents the natal river of the adult ad and [∑] is the variance and co-variance matrix (i.e. the mathematical precision). It was assumed that the partitioning coefficients for the isotopic ratio are b=0 and a=1 because no partitioning occurs between the water and the otolith compartments. For the elemental composition, each partitioning coefficients follows a flat uniform distribution between [0,2] for a and [-3,3] for b. The slope a was supposed to be positive as shown by (Bath et al. 2000). An uninformative prior was also chosen for [∑]: [∑] ~ Wishart([I],n) (2) with [I] the identity matrix (dimension 3×3) and n the degree of freedom (number of elements +1).

For the juveniles, the natal river is already known so their otolith compositions are described by the following relation: {Oto(jv)} ~ MN({a} . {Water(N(jv))} + {b}, [∑]) (3) with N(jv) the natal river, and thus the catch river, of the juvenile jv. Finally, a categorical distribution was proposed to reallocate the adults Allis shad to their natal river: N(ad) ~ Categorical({θc(ad),y(ad)}) (4)

For each combination of catch river c(ad) and year y(ad), a vector of probabilities of origin was defined for the kb rivers of the water baseline θc(ad),y(ad)(1),… ,θc(ad),y(ad)(kb). In this model, the a priori probability that an adult ad caught in the river r, the year y, born in each river of the baseline was described by a Dirichlet distribution which is an uninformative prior: {θc(ad),y(ad)} ~ Dirichlet({γ1:kb}) (5) with γ1 = … = γkb = 1/kb and kb = 17 (i.e. the number of rivers in the water baseline).

The Bayesian hierarchical model provides a probabilistic estimate of the natal river of adults. The transfer of information between the juvenile baseline and the otolith microchemistry of adults is performed by means of the variance-covariance matrix [∑] and the regression parameters a and b. This first Bayesian model supposed that water composition was effectively sampled in each potential source. This constraint introduces bias in the reallocation of fishes, which could be omitted using an Infinite Mixture Model.

### Bayesian model without baseline: Infinite Mixture Model

The second model consisted in a Bayesian hierarchical model which estimates the number of sources found in a mixed sample without reference to any baselines data sets. The clustering was based on the similarity between the otolith microchemistry of the adults without reference to the water microchemistry. This method is similar to that developed by Neubauer et al. (2013). Considering a mixed sample, a mixture of Gaussian distributions is assumed for the elemental and isotopic ratios. The Infinite Mixture Model (IMM) provides an estimate of the number of sources, and the proportion of fish in each sources (Munch & Clarke 2008).

When possible, in the second and third models, we used conjugate priors for computational ease (Görür & Rasmussen 2010).

In this model, the average otolith composition {Ō(ri)} of an adult is independent from the water composition and was assumed to follow a normal distribution centered on 0 with a large variance (precision = 1 x 10-6). Consequently, the otolith composition became: ({Oto(ad)} | N(ad) = r) ~ MN({Ō(r)}, [∑]) (6)

The mathematical precision [∑] followed the same distribution as in the first model (equation 2). Besides, the same categorical distribution for the reallocation was used as in the previous model:

N(ad) ~ Categorical({θ1:K}) (7)

with K the number of sources. In a purely Infinite Mixture Model, K can theoretically tends to infinity, however in practice, we constraint K in the coda by specifying a Kmax = 22 (Appendix II). The reallocation in Kmax sources is theoretically allowed but at the end of the iterative process, all the allowed sources were not filled. The definition of the K probabilities of origin {θ1:K} is based on the “stick breaking process” also called the “Chinese restaurant process” (Sethuraman 1994; Ishwaran & James 2003). Starting with a single stick, K pieces (i.e. K groups of individuals) could be obtained by a breaking process. Each piece presents a particular length which reflects the probability of belonging to this piece. As the lengths of the pieces decrease when the stick breaking process progresses, the probability of belonging to a new group decreases too. Those probabilities are denoted by {qj, j = 1… K}. Due to the stick breaking process, the weights of additional sources decrease when the process progresses. The K number of sources could potentially be infinite but the production of a new group depends on the equilibrium between the production cost of this new group and the benefits (i.e. reduction of variance in clusters). The process is based on the maximization of the extra-group variance contrary to the intra-group variance. A simple example with only three groups is presented in Figure 2.

p1=q1 p2=q2(1-q1)

p3=q3(1-q2)(1-q1)

Figure 2 Schematic presentation of the stick breaking process. The large black line represents the stick and the array corresponds to the evolution of the breaking process. This example considers three groups defined by 2 breaking points x0 (the first breaking point) and x1 (the second breaking point). The probabilities associated with each group are presented above or below the brackets.

As presented in Figure 2, the probability of belonging to a new group depends on the size of the previous group. The probability p1=q1 corresponds to the first break, the probability p2=q2(1-q1) is the proportion of the remainder stick from the first break etc.…. For each source {rj, j = 1…K-1}, the probability q follows a beta distribution: q(rj) ~ Beta (1,α) (8) with α the concentration parameter defined by α = 1/α0 with α0 described by a Gamma distribution.

Those probabilities {θ1}… {θK} follow a Dirichlet distribution, making this model a Dirichlet Process Model (DPM) which belongs to the family of Infinite Mixture Model. The main interest of this model is its capacity to estimate the number of sources, contrary to the first model which assumes reallocations only in rivers of the water baseline. However, in absence of baseline, the second model is not able to associate a source with a river. Here, a source is just a group and is not a precise ecological entity. The use of a hybrid model is a way to combine the advantages of the two first models and to overcome their respective drawbacks.

#### Bayesian hybrid model: Infinite Mixture Model with multiple baselines

The last model consisted in a combination of the first two models. The baselines (water and juvenile) and the stick-breaking-process were used to allow reallocations in rivers of the water baseline or in extra-sources. When the chemical signatures of individuals do not match those of the baselines, the model produces a new group out of the baselines. Indeed, the inclusion of individuals with “atypical” otolith signatures in a group of the baseline would induce a high intra-group variance and thus, the definition of a new source is preferred for this particular fish. The main interest of the third model lies in its ability to reallocate fishes in extra-sources while keeping the information from baselines.

In this sub-section, the rivers of the baseline are denoted by {ri, i = 1…kb} and the extra-sources are denoted by {ri, i = (kb+1)…K}.

Considering a hybrid model, the likelihood was defined by: ({Oto(ad)} | N(ad) = r) ~ MN(Ō(ri), [∑]) (10) with {Ō(ri)} = {a} . {Water(ri)} + {b} for rivers of the baseline (i = 1…kb) and {Ō(ri)} following a normal distribution for extra-sources (i = kb +1…K). In this model, the partition coefficient a was assumed to follow the same uniform distribution as in the first model. However, the coefficient b was different from the first model because of convergence difficulties. This parameter was here described by a normal distribution centered on 0 with a large variance (precision = 1e-6). The mathematical precision [∑] followed the same distribution as in the first model (equation 2).

The otolith composition of juveniles was described by the same multinormal distribution as in the first model (equation 3). Besides, the same categorical distribution for the reallocation was used as in the previous model (equation 7).

Assuming an Infinite Mixture Model, {θ1:kb} was defined by flat Dirichlet priors for the river of the baseline (equation 5). In this model, the a priori probability that an adult caught in the river r with q following the same distribution as in equation 8. Contrary to the first model, the year effect was not introduced in the probability of origin {θ1:K} because of convergence constraints.

The structure of the Bayesian hybrid model is outlined in Figure 3. The combination of the model with baselines and fixed number of sources (model 1) with the Infinite Mixture Model (model 2) provides an integrative model of reallocation (model 3).

**Bayesian posterior distribution using MCMC sampling**

Computations were performed with R software (R Development Core Team, R.3.1.1, 2014). The Monte Carlo Markov Chain (MCMC) method was used to draw simulations from Bayesian posterior distributions with the rjags package providing an interface from R to JAGS (Just Another Gibbs Sampling; Plummer 2003) library. Three MCMC chains were run in parallel for each model. For the first model, 20 000 iterations were run after a burn-in period of 10 000 iterations. On account of the stick breaking process used in the second and the third models, the number of iterations was increased to 200 000 with a burn-in period of 50 000 in order to target the chains’ convergence. The monitoring was performed on a, b, [∑], α, Ō(r), N(ad) and {θ}.

**Convergence diagnosis**

The convergence was tested for all posterior samplings using the Gelman and Rubin convergence diagnosis (Gelman & Rubin 1992) with the Coda library. The convergence of a parameter is checked if the potential reduction factor is below the threshold of 1.05 ( Brooks and Gelman, 1998). The convergence of categorical variables of reallocation N(ad) were checked by a visual examination of the chains mixing (see Appendix II for example of posterior checking).

**Models comparison**

Before selecting one of those three Bayesian models, a comparison of statistical performances and ecological reliability have to be performed. Thus, the three models were compared using both statistical criteria and indicators of reallocation reliability.

**Table of contents :**

**1. Introduction **

**2. Materials and methods**

2.1. Sampling

2.1.1. Water

2.1.2. Juveniles and adults Allis shad

2.2. Samples preparation and microchemistry analysis

2.3. Preliminary analysis of water and juvenile baselines

2.4. Construction of Bayesian hierarchical models

2.4.1. Bayesian model with fixed number of sources and multiple baselines

2.4.2. Bayesian model without baseline: Infinite Mixture Model

2.4.3. Bayesian hybrid model: Infinite Mixture Model with multiple baselines

2.4.4. Synthesis of the parameters

2.4.5. Bayesian posterior distribution using MCMC sampling

2.4.6. Convergence diagnosis

2.5. Models comparison

2.5.1. Statistical Criteria

2.5.2. Indicators of reallocation reliability

2.5.3. Comparison of reallocation and sources between models

2.5.4. Confusion of reallocation

2.6. Flux between donor and recipient rivers

3. Results

3.1. Analysis of water and juvenile baselines

3.2. Models comparison

3.2.1. Statistical comparison

3.2.2. Indicators of reallocation reliability

3.2.3. Comparison of reallocation

3.2.3.1. Comparison of sources and homing rate

3.2.3.2. Stable vs inconsistent fishes

3.2.3.3. Focus on extra-sources

3.2.4. Analysis of inter-river confusion of reallocation

3.2.5. Choice of model to investigate the functioning of the metapopulation

3.3. Functioning of the metapopulation

3.3.1. Recipient, donor and closed rivers

3.3.2. Origin of fishes per recipient river

3.3.3. Isolation by distance between donor and recipient rivers?

3.3.4. Exchanges between the North and the South

**4. Discussion **

4.1. Why and how comparing the three models?

4.2. Recommendations regarding a monitoring program of Allis shad

4.3. Strict homing, complete straying or limited diffusion: evidence and consequences

4.4. Metapopulation functioning: sinks and sources and implication for management

**5. Conclusion and prospects **

**6. References **

**7. Appendix**