PATHOGENICITY FACTORS EMPLOYED BY PLANT PATHOGENIC BACTERIA

Get Complete Project Material File(s) Now! »

Chapter 2:The genome sequence of Pantoea ananatis **LMG20103, the causative agent of Eucalyptus blight and dieback**

ABSTRACT

Pantoea ananatis is a broad host range plant pathogen which infects economically important crops such as rice, maize and onion. In South Africa, P. ananatis causes blight and dieback of several clones, hybrids and species of the important forestry resource Eucalyptus, resulting in devastating losses. In this chapter, the whole genome sequence of a highly virulent Eucalyptus-pathogenic P. ananatis strain, LMG20103, was sequenced, assembled and annotated. Pertinent genome and protein metrics are discussed. This is the first phytobacterial pathogen genome to be sequenced in Africa and the first member of the genus Pantoea, which hosts a number of important plant pathogens, to be completely sequenced and published.

INTRODUCTION

Pantoea ananatis is an emerging plant pathogen that infects a wide range of plant hosts including rice, maize, onion, melons and pineapple (Coutinho and Venter, 2009). In South Africa, it has been implicated in diseases of maize, onion and Eucalyptus (Goszczynska et al. 2006; Goszczynska et al. 2007; Coutinho et al. 2002). Eucalyptus blight and dieback by P. ananatis is of particular concern to the forestry industry as it results in significant losses of seedlings in nurseries. In 2008, blight and dieback led to the loss of 200,000 seedlings in a single nursery (Sean de Haas, Mondi Fountains Nursery, personal communication). Very little is known about the means by which P. ananatis infects and causes symptoms on its various plant hosts. Understanding the mechanism of disease could provide solutions to curb crop losses or eradicate the pathogen.
One means of gaining insight into how a pathogen causes disease is by sequencing its genome and mining the genome sequence for candidate genes involved in the pathogen interaction with the plant, the infection process and symptom development (Vinatzer and Yan, 2008). This has been successfully employed to analyse the pathogenesis of a number of important plant pathogens including the tomato pathogen Pseudomonas syringae, grapevine-associated Xylella fastidiosa and the potato soft rot organism Pectobacterium atrosepticum (Vinatzer and Yan, 2008; Buell et al. 2003; Bell et al. 2004). The first genome to be sequenced was that of Haemophilus influenzae by means of the classical Sanger sequencing (Fleischmann et al. 1995). Using the Sanger approach to sequence genomes is expensive and time-consuming.
However, novel “next generation” sequencing technologies have recently been developed which are capable of generating genomic data much faster and at a reduced cost. One such technology was developed by 454 Life Sciences (Roche Inc., Switzerland). This sequencer generates large amounts of genome data through an automated system utilising a pyrosequencing approach (Margulies et al. 2005). Briefly, the genome is fragmented through nebulisation and specialised adapters added to each fragment. Fragments are immobilised on beads immersed in a droplet of emulsion and clonally amplified. Following amplification, fragment-coated beads are loaded onto a fibre optic slide and sequence data is generated by detecting light emitted through the chemiluminescent cleavage of an inorganic pyrophosphate when a particular nucleotide is incorporated in the target sequence (Margulies et al. 2005). Sequencing with this technology yields data of similar accuracy to traditional Sanger sequencing, but at 1/10^th to 1/100^th of the sequencing cost, significantly reduced sequencing time and without the cloning bias associated with Sanger sequencing (Rothberg and Leamon, 2008). Following sequencing, the whole genome must be assembled and annotated. Annotation is the process by which biological information is attached to the genome sequence (Stein, 2001). The first step of annotation involves the identification all the genes in the genome sequence. Subsequently, these genes can be compared to those for which structural and functional data is available and thereby each gene is annotated. Given the benefits of genome sequences as inexhaustible information resources and the development of novel cost- and time-effective technologies for genome sequencing, this approach to studying the biology, epidemiology and pathology is being utilised increasingly to study mammalian, invertebrate and plant pathogens and has become available to the developing world.
In this chapter the sequencing, by means of 454 pyrosequencing, assembly and annotation of the genome of the Eucalyptus pathogen Pantoea ananatis LMG20103 are described. Several genome metrics were determined. All the genes on the P. ananatis genome were identified and the functions and subcellular localisation of the encoded proteins are discussed. The presence of mobile elements such as plasmids and integrated phage elements were determined. The annotated genome sequence was submitted to the National Center for Biotechnology Information genome database (Genome project: 43085; Genome Accession: CP001875).

MATERIALS AND METHODS

Strain Selection

Pantoea ananatis LMG20103 was selected for sequencing. This strain was isolated from a diseased Eucalyptus grandis x nitens hybrid in a plantation in Piet Retief, South Africa (Coutinho et al. 2002). It is stored in the Forestry and Agricultural Biotechnology Institute Bacterial Culture Collection under the designation BCC0127 and is maintained at the Laboratorium voor Microbiologie Gent (LMG) culture collection as LMG20103. This strain was selected for sequencing as it was the most virulent in pathogenicity trials on Eucalyptus grandis x nitens hybrid clones.

DNA extraction

Sequencing by 454 pyrosequencing technology requires 10 μg of purified DNA. DNA was extracted from a single colony of LMG20103 using the Qiagen tissue extraction kit (Qiagen, USA). This yielded 100ng/μl as determined by Nanodrop (Thermo Scientific, USA). Six extracts were pooled and precipitated with 1:100 3M Sodium Acetate and centrifuged at 13,000 rpm for 30 minutes. The supernatant was aspirated and the pellet washed twice with 250 μl 100% ethanol. After aspiration the pellet was airdried and resuspended in 500 μl Tris-EDTA buffer (50 mM TE). The 16S rDNA and gyrB sequences were determined from the genomic DNA for strain and purity confirmation.

Genome Sequencing

LMG20103 DNA was submitted to Inqaba Biotec.^TM (South Africa). Five runs using the Roche GS20 sequencer (Roche, Switzerland) were performed with four and a half plates filled with nebulised LMG20103 DNA. The GS20 sequencer produces reads 80 to 120 nucleotides in size with an expected yield of 20 Megabases (Mb) per plate from 200,000 reads. Thus 4 ½ runs would yield a total of 90 Mb of sequence, which would provide 18x sequencing depth to a genome of 5 Mb in size.

READ Chaya - Cnidoscolus aconitifolius

Genome Assembly

Initial assembly of the 80-120 nt reads was performed at Inqaba Biotec.^TM using the Roche Newbler Assembler linked to the GS20 sequencer. This yielded 356 contigs ranging in size from 81 to 317,166 nucleotides. Subsequently two different draft assemblies were produced. The first generation assembly was based on the contiguous nature of contigs larger than 500 nucleotides as determined by a scaffolding approach.
274 contigs smaller than 500 nt were excluded but analysed by performing BlastN analysis against the NCBI nucleotide database. The second generation assembly made use of re-assembly, including <500 nt contigs, as well as a scaffolding approach and gap closure by PCR.

First generation draft assembly

The first generation assembly was performed with 83 contigs larger than 500 nt in size. BLASTN analysis of the last 1000 nucleotides on either end of each contig against the NCBI nucleotide database was performed. The NCBI ORF-finder application (http://www.ncbi.nlm.nih.gov/gorf/gorf.html) was used to detect open reading frames within these last 1000 nucleotides and the amino acid sequences were subjected to BLASTP against the NCBI database. The percentage nucleotide and amino acid identity between the contig ends and homologues in the complete genome sequences of ten closely related Enterobacteriaceae strains was determined. Strains used were Salmonella enterica subsp. enterica serovar Paratyphi ATCC 9150, Pectobacterium atrosepticum SCRI1043, Escherichia coli K-12, E. coli O157:H7 str. Sakai, Serratia proteamaculans 568, Enterobacter sakazakii BAA-894, Enterobacter sp. 638, Yersinia pestis KIM, Citrobacter koseri ATCC BAA-895 and Klebsiella pneumoniae MGH 78578. Local BLASTN was also performed against contigs available for Pantoea stewartii subsp. stewartii DC283. When two sequence ends showed homology to contiguous sequence in five or more of the bacterial strains, the contigs were considered as contiguous in the LMG20103 genome and assembled. On the basis of the location where homology ended, an appropriate number of Ns were inserted to signify gaps. In the case of overlap the coinciding end was deleted from one contig to ensure accurate fit of the contigs.

Second Generation Draft Assembly

Newbler Assembler version 2.00.00 (Roche Inc., Switzerland) was used to reassemble contigs, including those smaller than 500 nt, yielding 117 contigs. These were assembled using a scaffolding approach as above. PCR gap closure was performed. Among the small contigs, repeat regions constituting fragments of the 16S, 23S and 5S rDNA genes and the ITS were found and by means of PCR and assembly of the ribosomal DNA fragments six complete copies of the 16S-ITS-23S-5S rRNA operons were assembled and these were incorporated in the second generation assembly.

Prediction of protein coding sequences and protein annotation

Open Reading frames for protein coding sequences (CDS) were predicted using a combination of gene prediction algorithms and systems. Initial genes were predicted by Glimmer v2.1.3 by the BASYS annotation system (Van Domselaar et al. 2005). This method predicts a significant number of false positives, especially in regions where GC content is greater than 60%. A newer version of Glimmer (v3.0.2) (http://www.cbcb.umd.edu/software/glimmer; Delcher et al. 1999; Salzberg et al. 1998), FgenesB (www.softberry.com) and AMIgene were also utilised for CDS prediction. AMIGene is employed in the MaGe annotation system and is reported to improve prediction of small and atypical genes (Bocs et al. 2003).
Amino acid sequences for the open reading frames predicted by the abovementioned methods were compared by local BlastP analysis. Those predicted by three or four of the prediction algorithms were maintained. Open reading frames predicted by only one or two of the methods were analysed by BlastP against the NCBI Protein database and were added to the total open reading frame set if they produced significant hits. The predicted protein coding genes were manually validated by BlastP of the CDS amino acid sequence against the NCBI protein database. Each of the encoded proteins was given a unique numerical locus tag with the prefix PANA_ and subsequently annotated. Initial annotation of CDSs was performed using the BASys automated annotation system (Van Domselaar et al. 2005). This server makes use of over 70 bioinformatic tools to generate a report for each CDS detailing the gene and protein names, nucleotide and amino acid sequences and orientation of genes, synonyms, G+C content, COG function, specific function, metabolic importance, gene ontology, homologues and paralogues, the presence of signal peptides and transmembrane helices and subcellular localisations. The MaGe annotation system provided similar information for each of the CDSs. Both automated annotation systems make use of local nucleotide and protein databases to annotate the CDSs. Annotation of all LMG20103 CDSs was validated by BlastN and BlastP analyses against the nucleotide and protein databases on the NCBI web-server (Http://www.ncbi.nlm.nih.gov).

ACKNOWLEDGEMENTS
PREFACE
CHAPTER 1 THE PATHOGENICITY FACTORS OF PLANT PATHOGENIC BACTERIA AND THEIR IDENTIFICATION AND ANALYSIS IN THE GENOMIC ERA
ABSTRACT
INTRODUCTION
PLANT PATHOGENIC BACTERIA
PATHOGENICITY FACTORS EMPLOYED BY PLANT PATHOGENIC BACTERIA.
IDENTIFICATION AND ANALYSIS OF PATHOGENICITY FACTORS IN PLANT PATHOGENIC BACTERIA
CONCLUSIONS
REFERENCES
CHAPTER 2 THE GENOME SEQUENCE OF PANTOEA ANANATIS LMG20103, THE CAUSATIVE AGENT OF EUCALYPTUS BLIGHT AND DIEBACK
ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS
DISCUSSION
REFERENCES
FIGURES AND TABLES
CHAPTER 3 COMPARATIVE GENOMICS REVEALS KEY TARGETS FOR ENVIRONMENTAL COLONISATION AND PLANT PATHOGENESIS IN THE WIDE HOST RANGE PATHOGEN PANTOEA ANANATIS
ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
CONCLUSIONS
REFERENCES
FIGURES AND TABLES
CHAPTER 4 IN SILICO IDENTIFICATION AND ANALYSIS OF THE PUTATIVE PATHOGENICITY FACTORS OF PANTOEA ANANATIS LMG20103
ABSTRACT
MATERIALS AND METHODS.
RESULTS AND DISCUSSION.
CONCLUSIONS
REFERENCES
FIGURES AND TABLES
CHAPTER 5 AN IN DEPTH IN SILICO ANALYSIS OF THE TYPE VI SECRETION SYSTEMS OF PANTOEA ANANATIS LMG20103
ABSTRACT
INTRODUCTION .
MATERIALS AND METHODS.
RESULTS
DISCUSSION
REFERENCES
FIGURES AND TABLES
CHAPTER 6 FUNCTIONAL ANALYSIS OF ANANATAN, AN EXOPOLYSACCHARIDE PRODUCED BY PANTOEA ANANATIS, WHICH IS HOMOLOGOUS TO STEWARAN AND AMYLOVORAN AND PLAYS A ROLE IN SYSTEMIC INFECTION OF ONION AND BROWN-ROT DISEASE OF PINEAPPLE
ABSTRACT
INTRODUCTION
METHODS AND MATERIALS
RESULTS AND DISCUSSION.
CONCLUSIONS
REFERENCES
FIGURES AND TABLES
SUMMAR
GET THE COMPLETE PROJECT