IDENTIFICATION OF MYCOPLASMASPECIES FROM SOUTH AFRICAN POULTRY FARMS AND ASSESSMENT OF ANTIMICROBIAL RESISTANCE

Get Complete Project Material File(s) Now! »

CHAPTER 3: GENOME ASSEMBLY AND ANNOTATION OF MYCOPLASMAPULLORUM, ISOLATED FROM DOMESTIC POULTRY IN SOUTH AFRICA

The content of this chapter was published as a Genome Announcement by A Beylefeld and C Abolnik, entitled “Complete Genome Sequence of Mycoplasma pullorum Isolated from Domestic Chickens”
published online 23 February 2017.

Introduction

Various strategies have been used in the last couple of decades to assemble complete genomes for various species. The first strategies involved shearing the DNA into smaller sizes and sequencing each piece by Sanger-sequencing either randomly (also known as shot-gun sequencing) or directed using primer walking, but these methods are labour intensive and time-consuming. The first complete mycoplasma genome sequenced, M. genitalium, was sequenced using shot-gun sequencing with capillary electrophoresis. The first completed poultry mycoplasma genome, M. gallisepticum strain Rlow, was also sequenced using shot-gun sequencing with capillary electrophoresis, but was followed by primer walking to close gaps (Fraser et al., 1995, Papazisi et al., 2003).
The introduction of second generation sequencing technologies (SGS) has made it possible to sequence the complete DNA complement of an organism in a single experiment, however this strategy produces large datasets containing billions of short read sequences that require computational resources for assembly of a complete genome (Besser et al., 2017). Strategies to assemble these reads are mainly by de novo assembly or mapping to a closely-related reference genome (Pop, 2009). The whole genome sequence for M. gallinaceum was completed using only de novo assembly of high-throughput Illumina data (Abolnik and Beylefeld, 2015). However, factors such as sequencing errors known to occur in SGS technologies, repeat regions and other factors influence the data output, resulting in a draft genome consisting of multiple scaffolds, rather than complete genomes (Pop, 2009, Ekblom and Wolf, 2014). Experimental methods, such as primer walking can be used to close the gaps to produce better quality genomes but are still time consuming and expensive. Hybrid methods combining data from different sequencing technologies have also been introduced with some success, however every organism is different, and the optimal strategy will depend on genomic characteristics, such as size, GC content and repetitive regions, and other external factors including budget and available resources (Ekblom and Wolf, 2014).
Mycoplasma genomes have a low GC content, contain numerous repeat region, and utilise a different genetic code, making producing complete genomes for species from this genus very difficult. De novo assembly strategies usually result in numerous contig sequences that will be too time consuming and expensive to assemble into a complete genome. The aim of this study was to assembly a complete genome for the previously uncompleted M. pullorum using whole genome sequencing data and in silico methods.

Materials and Methods

Sample collection, isolation and identification

Poultry mycoplasma samples were collected, isolated and identified as described in Chapter 2. Briefly, samples were collected by veterinarians from chickens using swabs and sent to the Bacteriology laboratory of the DVTD for the identification of mycoplasma species by culture with growth inhibition by Johan Gouws and Pamela Wambulawaye. The DNA of the mycoplasma-positive samples were isolated as described and sent for Ion Torrent PGM whole genome sequencing at UP before samples were identified using the 16S rRNA gene. The remainder of the samples were frozen at -20˚C for future downstream analysis. Sample B359-6 was also sent to Inqaba Biotech (Pty) Ltd, Pretoria for Illumina MiSeq whole genome sequencing.

Quality control

The fastq sequencing files produces by Ion Torrent PGM whole genome sequencing and Illumina MiSeq whole genome sequencing were submitted to the FASTQC program (version 0.11.5), (available at https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) to produce a quality control report and assess the amount and quality of reads and the presence of adapters (Andrews, 2010). The sequencing files were imported into CLC Genomics Workbench version 8.5.1 (CLC Bio-Qiagen, Aarhus, Denmark) using the platform specific import function. Low quality reads were trimmed and filtered, and sequencing adapters trimmed using the default settings of the Trim Sequences function of CLC Genomics Workbench with the Nextera Trim Adapter Library. The trimmed files were analysed again with FASTQC for quality control.

Sequence assembly

Single-end reads produced by the Ion Torrent sequencing platform were assembled de novo in CLC Genomics Workbench (version 8.5.1) using the default settings, and a minimum contig length of 500 bp. The reads were also mapped back to the contigs using the default settings with global alignment and saved for downstream analysis. De novo assembly of the Ion Torrent data was performed twice in CLC genomics workbench. As described in Chapter 2, Ion Torrent reads were also subjected to digital normalization using Khmer (version 2.0) (Brown et al., 2012, Crusoe et al., 2015) to decrease the amount of reads and submitted to the IonGAP server twice (available at http://iongap.hpc.iter.es/iongap) (Baez-Ortega et al., 2015), the first time using the Genome assembly and Bacterial classification module and a second time using only the Genome assembly module.
Paired-end reads produced by the Illumina MiSeq sequencing platform were also assembled de novo in CLC Genomics Workbench (version 8.5.1) using the default settings with the “include the paired-end reads to detect paired distances and perform scaffolding” option activated and produce only contigs with a minimum length of 500 bp. Illumina reads were also mapped back to the contigs using the default settings with global alignment and saved for downstream analysis.
The quality of each assembly was assessed and compared using Quast, a Quality Assessment Tool for Genome Assemblies from the Center for Algorithmic Biotechnology (available at http://quast.bioinf.spbau.ru/) (Gurevich et al., 2013). The complete genome assembly of sample B359-15-6 identified as M. pullorum was completed in silico using different strategies.

Strategy 1: De novo assembly with manual contig joining

The de novo assembled Ion Torrent contigs were aligned using the input contigs as reference with the default settings of the “align contigs” tool of the Genome Finishing Module (version 1.5.4) of CLC Genomics Workbench to produce a contig match table file. Starting with the largest contig, contigs were joined manually dependent on overlapping contigs at the 3’ and 5’ ends using the following parameters: 1) minimum contig match identity of above 95% and 2) minimum contig overlap length of 20 bp. Where multiple contigs aligned, the best fit was chosen for the join. When contigs could not be joined further the minimum contig match identity was lowered to above 80% to reduce the number of contigs to a single contig representing the whole genome. This process was continued until no more contigs could be joined.

READ Oral bioavailability of cGP in adult and infant rats

Strategy 2: De novo assembly with manual contig joining from multiple genome assembly platforms

The de novo assembled Ion Torrent contigs produced were aligned and joined as described for strategy 1. Before the minimum contig match identity was lowered to 80%, the LargeContigs.fasta file produced by the IonGAP server was imported into CLC Genomics Workbench and the contigs added to the contig match table file. The joined contigs were extended using the contigs produced by the IonGAP server using the same parameters described above. As with strategy 1, when contigs could not be joined further, the minimum contig match identity was lowered to above 80% to reduce the number of contigs to a single contig representing the whole genome. This process was continued until no more contigs could be joined

Strategy 3: Hybrid genome de novo assembly with manual contig joining using multiple sequencing platforms and multiple genome assembling platforms with stepwise addition of each data set

The workflow shown in Figure 3-1 was followed starting with the largest contig until the full genome was assembled. Briefly the 5’ or 3’ end of the contigs were viewed to assess the possible matches for one of the four scenarios 1) when one possible match existed, the contigs are joined and the newly joined contig analysed again, 2) when multiple matches existed the matches were first compared to each other to determine if a) the matches are the same: the longest contig was then joined and the remaining matches were notes as part of the particular join b) the contig matches were not similar a copy of the file was made and every distinct contig match evaluated using each of the above scenarios, 3) when the 5’ end matched to the 3’ end the size of the contig was evaluated for possible completion of genome or noted as a possible repeat sequence and saved for resolve by downstream analysis, 4) when no matches were possible the contig was saved for resolution by downstream analysis. If multiple matches were possible the contig was not elongated and saved for downstream analysis.

Strategy 4: Hybrid genome de novo assembly with manual contig joining using multiple sequencing platforms

All three sets of de novo assembled contigs produced in CLC Genomics Workbench for Illumina and Ion Torrent sequencing data were pooled with the contigs produced by the two IonGap assemblies and a contig match table produced in CLC Genomics Workbench using the default settings of BLAST word size of 20 and minimum match size of 100. The workflow described in Figure 3-1 was followed. The end was determined using scenario (3) where the 5’ end matched to the 3’ end and the genome size was in range with the expected size of the mycoplasma genome as determined by the total length of combined contigs obtained from the genome assembly statistics. The remaining contigs were analysed for the following scenarios 1) if there is a high contig match percentage, the contig was removed, 2) no contig matches and short contig length and low read coverage contigs were removed, 3) no contig match and high coverage, the contig was exported and submitted to the National Centre for Biotechnology Information (NCBI) nucleotide BLAST webtool (available at https://blast.ncbi.nlm.nih.gov/Blast.cgi) (Zhang et al., 2000) for identification.
The final contig was exported from the contig match table and the Illumina and Ion Torrent reads were mapped onto the contig separately and a report generated for evaluation.

Genome annotation and viewing

The genome was then submitted for annotation to the NCBI Prokaryotic Genome Annotation Pipeline (PGAP) (Tatusova et al., 2016). The resulting Genbank® file was downloaded from Genbank®^® and a complete circular genome was viewed using the custom analysis pipeline of the online server G-view: a circular and linear genome viewer, see appendix B for the style sheet (available at https://server.gview.ca/#) (Petkau et al., 2010). A complete genome analysis was also produced by the US Department of Energy Joint Genome Institute (DOE-JGI) in collaboration with the user community and presented on the Integrated Microbial Genomes and Microbiomes (IMG) website (available at https://img.jgi.doe.gov/) (Chen et al., 2017). The resulting genome statistics and results from the DOE-JGI Microbial Genome Annotation Pipeline (MGAP) were viewed (Figure 3-2) (Huntemann et al., 2015).
The genome was also reanalysed using the NCBI-PGAP pipeline by the NCBI team in 2017 and annotated as a reference sequence. The protein files for the two NCBI-PGAP annotations were exported and submitted to the online webserver WebMGA (available at http://weizhong-lab.ucsd.edu/webMGA/server/) for functional analysis using the Clusters of Orthologous Group (COG) categorisation of proteins (Wu et al., 2011). The COG classification of proteins generated for each annotation were compared and results correlated.

DECLARATION.
ETHICS STATEMENT
ACKNOWLEDGEMENTS
ABSTRACT
LIST OF ABBREVIATIONS
TABLE OF CONTENTS
LIST OF FIGURES
LIST OF TABLES
CHAPTER 1: INTRODUCTION AND LITERATURE REVIEW
1.1. INTRODUCTION.
1.2. MYCOPLASMA
1.2.1. General characteristics
12.2. Poultry mycoplasmas
1.2.3. Role of poultry mycoplasmas in disease
1.2.4. Diagnosis of poultry mycoplasma
1.2.5. Serological methods
1.2.6. DNA based methods
1.2.7. Treatment and prevention of poultry mycoplasma
1.2.8. Mycoplasma genomics
1.3. GENOME SEQUENCING
1.3.1. Brief overview
1.3.2. Sample processing
1.3.3. DNA sequencing
1.3.4. Data analysis
1.4. AIM OF THE RESEARCH
1.5. PURPOSE OF THE RESEARCH
CHAPTER 2: IDENTIFICATION OF MYCOPLASMASPECIES FROM SOUTH AFRICAN POULTRY FARMS AND ASSESSMENT OF ANTIMICROBIAL RESISTANCE
2.1. INTRODUCTION
2.2. MATERIALS AND METHODS
2.2.1. Sample collection
2.2.2. Mycoplasma isolation by culture and identification by growth inhibition.
2.2.3. Mycoplasma DNA isolation
2.2.4. 16S rRNA gene phylogeny
2.2.5. Minimum inhibitory concentration (MIC) assays
2.2.6. Antimicrobial resistance genes
2.3. RESULTS.
2.3.1. Mycoplasma identification by growth inhibition
2.3.2. Mycoplasma DNA isolation
2.3.3. Mycoplasma identification by 16S rRNA gene identification
2.3.4. Comparison of culture with 16S rRNA gene identification
2.3.5. Minimum inhibitory concentration (MIC) assays
2.3.6. Antimicrobial resistance genes
2.4. DISCUSSION
CHAPTER 3: GENOME ASSEMBLY AND ANNOTATION OF MYCOPLASMA PULLORUM, ISOLATED FROM DOMESTIC POULTRY IN SOUTH AFRICA
3.1.INTRODUCTION.
3.2. MATERIALS AND METHODS
3.3 RESULTS
3.4.DISCUSSION
CHAPTER 4: COMPARATIVE GENOME ANALYSIS OF MYCOPLASMA SPP ISOLATED FROM SOUTH AFRICAN POULTRY
4.1.INTRODUCTION
4.2 MATERIALS AND METHODS
4.3 RESULTS
4.4.DISCUSSION
CHAPTER 5: CONCLUSION AND FUTURE PERSPECTIVES
APPENDIX
REFERENCES
ANIMAL ETHICS COMMITTEE
GET THE COMPLETE PROJECT