Genotype-Environment Interaction Shapes the Microbial Assemblage in Grapevine’s Phyllosphere and Carposphere: An NGS Approach (Published Manuscript)

Get Complete Project Material File(s) Now! »

Next Generation Sequencing (NGS)

Improvements in DNA sequencing broadened the ability of researchers to study the microbial community structure and function with a higher resolution by employing metagenomic approaches. Metagenomics can be defined as the direct genetic analysis of the collection of genomes within an environmental sample, this can be achieved either through whole metagenome sequencing or amplicon-based sequencing [61,62]. The innovations in high-throughput, short-amplicon sequencing are revolutionary in a way that they can describe the microbial diversity within and across complex biomes [63]. Although high throughput methods have been widely used to investigate the microbial ecology of various environments [44,64-65], their application in grapevine and wine fermentation microbial ecology is relatively recent, and their contribution to the field has not been much explored. Until recently, the 454 pyrosequencing and Illumina platforms were the most commonly used platforms for grapevine ecology surveys. At least 48% of the published data on the vineyard, grapevine and wine microbiome is derived from 454 pyrosequencing while the remaining 52% is derived from Illumina sequencing [67]. Both platforms work on a sequencing by synthesis approach but differ in their chemistries. Bridge amplification of adaptor-ligated DNA fragments on the surface of a glass is the core process of Illumina sequencing [68]. Afterward, bases are determined using a cyclic reversible termination technique, which sequences the template strand, a single nucleotide at a time through progressive rounds of base incorporation, washing, scanning, and cleaning. In this method, labeled dNTPs are used to stop the polymerization reaction, allowing the removal of unincorporated bases. The fluorescent dye is captured to identify the bases added, and then cleaved so that the next nucleotide can be added, this is then repeated [68-70]. Earlier Illumina analysis generated at least 1 Gb of sequences with reads averaging 35 bp and the duration of 2–3 days. However, the introduction of HiSeq and MiSeq machines altered the duration time to ∼4 days and 24–30 h, and increased the read length to 250–300 bp, respectively with error rates of below 1%, with substitution the most occurring issue [71,72].
In 454 pyrosequencing an emulsion PCR is used for bridge amplification of adaptor-ligated DNA fragments on the surface of a bead. The beads are thereafter distributed and fixed into 44 µm wells, where the sequencing by synthesis occurs. After the nucleotide bases are incorporated as an enzymatic luciferase coupled reaction occurs, allowing for the identification of bases, which is measured using a charged couple device [66-68]. The 454 pyrosequencing technique was reported in 2008, as the most published NGS platform, however, the technology has since been discontinued, and has therefore been surpassed by Illumina [69,70].
NGS has been widely used for the comprehensive evaluation of the vineyard or grape microbiome, and typically two key questions were addressed. Firstly, which microorganisms are present in the environment, and secondly the role of the individual species [73]. To understand the role of the identified species, in grape or wine microbiome requires that standard microbiological methods be applied to isolate the strains and then evaluate them for their potential contribution to grape or wine quality by assessing their phenotypic and genotypic properties and thereafter they will be evaluated in different wine matrices to assess their growth and metabolic profile. To this effect, several species retrieved using culture-dependent methods and have been shown to contribute positively in the winemaking process. For instance, some strains of Wickerhamomyces anomalus, Candida pyralidae, T. delbrueckii, and Kluyveromyces wickerhamii were shown to suppress the growth of B. bruxellensis [74], a wine spoilage yeast; M. pulcherrima was highlighted as a desirable co-inoculant for the reduction of ethanol [75], while others such as Hanseniaspora vineae, Starmerella bacillaris, L. thermotolerans, P. kluyveri, and T. delbrueckii present various desirable aroma signatures [72,74].

Bioinformatic Data Analysis: A powerful tool to unravel microbial diversity

High throughput sequencing techniques usually generate large amounts of sequence data, and the only viable option to handle such information is via automated approaches. There are currently several open source pipelines (described below) and most of these pipelines provide the tools for basic data analysis steps such as data cleaning, sequence clustering, functional annotation, and taxonomic assignments (Fig 5).

Metagenomic Rapid Annotations using Subsystems Technology (MG-RAST):

It is one of the biggest repositories for metagenomic data and an open source web application server that suggests automatic phylogenetic and functional analysis of metagenomes [83]. Using a combination of several bioinformatics tools, the MG-RAST offers automated quality control, annotation, comparative analysis and archiving service of metagenomic and amplicon sequence datasets. The application supports amplicon 16S, 18S, and ITS sequences and metatranscriptome (RNA-seq) sequences processing [84] and the profiles for the metagenomes can be visualized and compared by using bar charts, trees, spreadsheet-like tables, heatmaps, PCoA, rarefaction plots, circular recruitment plot, and KEGG maps.
Apart from metagenome analysis, MG-RAST can also be used for data discovery. The visualization or comparison of metagenomes profiles and data sets can be implemented in a wide variety of modes; the web interface allows to select data based on criteria like composition, sequences quality, functionality or sample type and offers several ways to compute statistical inferences and ecological analyses [83,84].

Quantitative Insights Into Microbial Ecology (QIIME):

QIIME is another bioinformatic pipeline designated for the task of analyzing microbial communities that were sampled through a marker gene (e.g., 16S or 18S rRNA genes) amplicon sequencing. In its heart, the pipeline includes the steps of quality control over the input sequencing reads, clustering the marker gene nucleotide sequences at a requested phylogenetic level (e.g. 97% for species level) into OTUs (operational taxonomic units) and taxonomically annotates the OTUs by looking for sequences similar to them on a reference taxonomic database [85]. « OTU » is the common term used to refer the clusters of uncultivated or unknown microorganisms, grouped by DNA sequence similarity [61] of a specific taxonomic marker gene(e.g, 16S or ITS). In other words, OTUs are pragmatic proxies for microbial « species » at different taxonomic levels, in the absence of traditional systems of biological classification as are available for macroscopic organisms. The main output from the QIIME pipeline is the OTU table, which describes the microbial OTUs and their abundances in each of the samples. Additional tools like including rarefaction, beta diversity assessment, principal coordinates analysis (PCoA), that are relevant to ecological aspects of the samples being investigated are also provided within the pipeline [85]. QIIME is under active development since its release in 2010.

READ Elicitation of criteria weights and discrimination thresholds .

DNA Amplification and Amplicon Sequence Library Preparation

To access bacterial communities, the V4 region of the 16S ribosomal gene was amplified using primers 515F and 806R and fungal community diversity and abundance were accessed using modified ITS9 and ITS4 primers targeting the ITS2 region [23,24]. Two-step PCR was performed to prepare sequencing libraries. PCR1 was designed to perform amplification of the target regions and to add Illumina Nextera transposase sequence to the amplicons. Both forward and reverse primers for PCR1 were amended with frameshift (FS) sequences in their 50 overhang to improve sequence diversity and overall read quality [25]. PCR1 was performed in 25 L reactions with 30 ng of sample DNA while using the KAPA HiFi HotStart (KAPA Biosystems, Wilmington, MA, USA) PCR mix (Initial denaturing at 95 C followed by 30 cycles of denaturing at 95 C for 30 s, primer annealing at 57 C for 60 s, and primer extension at 68 C for 60 s). Amplicons were purified while using Agencourt AMPure XP beads (Beckman Coulter, Brea, CA, USA) at a bead-to-DNA ratio of 0.7:1, resuspended in 30 L MilliQ water, and evaluated in agarose gels. In PCR2, Primers from Illumina kit for dual indexing of the amplicons was used. Each cleaned PCR1 product within the same sample received a unique combination of forward and reverse primers (respectively, N7 and S5 Illumina dual index oligos). Afterward, samples were again cleaned while using AmPure XP magnetic beads, pooled in equimolar concentrations, and sequenced using 2 250 bp MiSeq v2 sequencing (Illumina Inc., San Diego, CA, USA).

Data Processing and Analysis

Demultiplexed RAW data files covering all of the samples were imported into the R-environment, (R Core Team, Vienna, Austria). The entire amplicon sequences data was uploaded to the institutional server (http://agap-ng6.supagro.inra.fr/inra). Paired forward and reverse reads from raw data files were trimmed (primer removal) and filtered (base quality) while using the fastqPairedFilter function of the dada2 package [26] and bases with low-quality scores (<11) were discarded. These filtered files were then processed using Divisive Amplicon Denoising Algorithm (DADA) pipeline which included the steps of dereplication, core denoising algorithm (that models and corrects Illumina-sequenced amplicon errors) and the merging of the base pairs. Merging function provided global ends-free alignment between paired forward and reverse reads and merged them together if they overlapped exactly and a table for amplicon sequence variants (ASVs, a higher analog of operational taxonomic units—OTUs) was constructed. It records the number of times each amplicon sequence variant is observed in each sample. DADA infers sample sequences exactly and resolves differences of as little as one nucleotide [26]. Chimeras were removed using the removeBimeraDenovo function of the same dada2 package (Table 2 represents the total number of reads available during these steps). ASVs or OTU sequences were assigned a taxonomy using the RDP classifier [27,28] with k-mer size 8 and 100 bootstrap replicates. Afterward, a phyloseq data object was created using phyloseq function of the phyloseq package in R [29]. Unassigned taxa and singletons were removed and this data object was then used to calculate microbial abundances, , diversity analysis and for other statistical tests using various functions in the phyloseq and vegan packages [29,30].

Impact of Agro-Climate Zones (or Terroir) and Genotype

Analysis of the microbiome of leaf phyllosphere on the 5 grapevine cultivars of set2 in the three very diverse French regions revealed a strong effect of terroir. A very clear differentiation of the samples collected in the three regions was observed on PCoA plots for bacterial (Figure 4A,B). Leaf PMCs for the five cultivars indeed clustered only according to grapevine locations (PERMANOVA for 16S data: F = 12.98, p = 0.001; for ITS data: F = 6.094, p = 0.001). The -diversity estimates also indicated very significant differences in OTU richness (Figure 4C,D) between the three regions (ANOVA for 16S data: F = 25.73, p = 3.11 10 7; for ITS data: at F = 26.329, p = 2.5 10 7). In combination, these results illustrated that French agro-climatic zones have very strong impacts in shaping the microbial assembly in the leaf phyllosphere. In addition, it has also suggested that there is not only a region-wise difference in taxonomic compositions, but each region (or agro-climate zone) has a unique microbial signature (Figure 4E,F). Multiple testing (with corrected p-values to control false discovery rates) on taxa abundance gave 31 bacterial and 21 fungal genera, which were differentially abundant among these three regions representing different environment (Supplementary Table S2).

Table of contents :

Chapter 1
1.1. Introduction
1.1. Grapevines
1.2. Plant associated Microbiome
1.2.1 Rhizosphere
1.2.2. Endosphere
1.2.3. Phyllosphere
1.3. Characterisation of plant-associated microbiome
1.3.1. Culturable vs Non-culturable …
1.3.2. Next Generation Sequencing (NGS)
1.3.3. Target genes
1.3.4. Bioinformatic data analysis …
1.3.4.1. Metagenomic Rapid Annotations using Subsystems Technology (MG-RAST)
1.3.4.2. Quantitative Insights into Microbial Ecology (QIIME):
1.3.4.3. MOTHUR
1.3.4.4. UPARSE: highly accurate OTU sequences from microbial amplicon reads
1.3.4.5. Divisive Amplicon Denoising Algorithm (DADA) …
1.4. Scope of the thesis
1.4.1. Objectives
1.4.2. Organization
1.5. References
Chapter 2 Assessing the impact of plant genetic diversity in shaping the microbial community structure of Vitis vinifera phyllosphere in the Mediterranean (Published Manuscript)
2.1 Abstract
2.2 Introduction
2.3 Material and Methods
2.4 Results
2.5 Discussion
2.6 Acknowledgements
2.7 Disclosure Statement
2.8 Funding
2.9 References
Chapter 3 Genotype-Environment Interaction Shapes the Microbial Assemblage in Grapevine’s Phyllosphere and Carposphere: An NGS Approach (Published Manuscript)
3.1 Abstract
3.2 Introduction
3.3 Material and Methods
3.4 Results
3.5 Discussion
3.6 Conclusion
3.7 Author Contribution
3.8 Acknowledgements
3.9 Funding
3.10 References
3.11 Supplementary Informations
Chapter 4 Understanding the phyllosphere microbiome assemblage in grape-species with amplicon sequence data structures (Submitted Manuscript)
4.1 Abstract
4.2 Introduction
4.3 Results
4.4 Discussion
4.5 Material and Method
4.6 References
4.7 Acknowledgements
4.8 Author Contribution
4.9 Tables & Figures
Chapter 5
5.1. Concluding Remarks
5.2 Phyllosphere Microbiome as Biocontrol Agent (BCA)
5.3 Microbiome Engineering
5.4 Phyllosphere microbiome for wine quality improvement
5.5 References