Clinical and biological aspects of glioma
The cell of origin for glioma has been an issue for discussion, with evidence pointing to neural stem cells (NSCs), or NSC-derived astrocytes or oligodendrocyte precursor cells (OPCs). Consideration of cell of origin suggests that glioma formation may result from acquisition of mutations in a variety of neural and glial cell backgrounds.
For example, GBMs have further been sub-classified based on gene expression signatures into classical, mesenchymal, proneural and neural subtypes . Moreover, further subclasses on the basis of microRNA expression resemble radial glia, oligoneuronal precursors, neuronal precursors, neuroepithelial/neural crest precursors or astrocyte precursors .
Prognosis of glioma
Prognosis is dependent on both grade and molecular profile: diffuse gliomas are divided into three prognostic molecular subgroups: the IDH wild type have the poorest outcome (median OS is 2 years for grade 3), the IDH mutation and 1p/19q co-deleted gliomas have the best survival (median OS >14 years for grade 3) and the IDH mutated non co-deleted (median OS 5-7 years for grade 3). Outcome is also dependent on age, and performance status.
Treatments of glioma
Gliomas grade III and IV are typically treated by surgical resection (if possible) followed by radiotherapy and chemotherapy. Alkylating agents, notably nitrosourea and temozolomide, have shown benefits on patient survival particularly in tumours with IDH mutation and/or with MGMT promoter methylation . Grade IV (ie glioblastomas) are treated with radiotherapy and concomitant and adjuvant temozolomide . In grade III gliomas, the modality and type of chemotherapy is dependent on genomic profile: IDH wild type grade III are assimilated to GBM (see above), IDH mutated co deleted are treated with radiotherapy and adjuvant nitrosourea based chemotherapy (PCV), IDH mutated non co-deleted are treated with radiotherapy and adjuvant chemotherapy (PCV or TMZ) . Management of grade II gliomas is based on surgical resection which may be iterative, with wait and see periods, chemotherapy, and radiotherapy associated with adjuvant chemotherapy in case of “high risk” grade II glioma .
There is no standard of treatment at recurrence. Targeted therapies, antiangiogenic therapies , and immunotherapies have been disappointing so far. While, targeting EGFR initially appeared to be an attractive therapeutic strategy in GBM tumours, clinical effectiveness has so far been limited by both upfront and acquired drug resistance . A vaccine targeting the most common IDH1 alteration (p.Arg132His) has recently been demonstrated to introduce anti-tumour immunity and has been proposed as a viable future therapy for tumours with this mutation .
Genetic architecture of susceptibility to cancer
Genetic susceptibility, also called genetic predisposition or genetic risk, refers to the increased risk of developing a particular disease based on a person’s germline DNA. The two- to three-fold familial risks associated with glioma and other cancers are compatible with a range of effect sizes and frequencies of predisposition alleles observed in the population. The composition of risk alleles for a given disease is typically described as the genomic architecture of disease susceptibility (Figure 1.6). More than 40 years ago, Anderson  stated that the magnitude of these familial risks seen for almost all cancers was not indicative of strong genetic effects but instead suggested a mechanism involving many genes with smaller effect acting in concert with environmental or non-genetic factors with larger and more important effects .
In terms of evidence to validate these models, a number of rare high penetrance cancer susceptibility genes were successfully identified by linkage studies of highly selected families across 1980s-2000s, hence validating the “multi-locus/multi-allele” model. Examples of these include most of the currently known high-penetrance susceptibility genes, for example BRCA1 and BRCA2 in breast cancer, MLH1 in colorectal cancer and CDKN2A in melanoma  (Figure 1.6). In recent years the search for additional rare high penetrance mutations has continued, using High-Throughput Sequencing (HTS) techniques, which offer greater resolution than genetic linkage. In fact the increasing cost effectiveness, quality, throughput and bioinformatics resources supporting HTS are enabling comprehensive studies of the entire exome or genome in large patient cohorts.
Figure 1.6 Genetic architecture of cancer risk. Taken from .This graph depicts the low relative risks (RRs) associated with common, low-penetrance genetic variants (such as single nucleotide polymorphisms (SNPs) identified in genome-wide association studies (GWAS)); moderate RRs associated with uncommon, moderate-penetrance genetic variants (such as ataxia telangiectasia mutated (ATM) and checkpoint kinase 2 (CHEK2)); and higher RRs associated with rare, high-penetrance genetic variants (such as pathogenic mutations in BRCA1 and BRCA2 associated with hereditary breast and ovarian cancer). BRIP1, BRCA1 interacting protein C-terminal helicase 1; MLH1, mutL homologue 1; MSH2, mutS homologue 2; PALB2, partner and localizer of BRCA2. Increased risk of glioma is now recognised to be associated with a number of these Mendelian cancer predisposition syndromes, notable neurofibromatosis (NF1 and NF2), Li-Fraumeni and Turcot’s . Additionally, germline mutation of CDKN2A has been reported to be a cause of the astrocytoma-melanoma syndrome . A number of these cancer syndromes are now recognised to be associated with an increased risk of glioma (Table 6). Inherited mutations in these genes are typically very rare at a population level and are consistent with Knudson’s “two-hit” hypothesis of cancer development . Collectively however these syndromes are rare and account for little of the two-fold of familial risk of glioma in the population .
More recent models of genetic susceptibility to glioma
The identification of susceptibility genes to glioma through linkage analysis has been limited. In a segregation study of four Finnish families with two or more gliomas non-significant linkage was attained at 15q23-q26.3 . In 2011, linkage analysis by Shete et al using high-density SNP arrays of 46 US families provided suggestive linkage at 17q12-q21.32 . however replication genotyping of an independent series of 29 families has failed to provide evidence for causal basis of the linkage signal .
Linkage studies are not powered to detect moderate and low-penetrance alleles conferring more modest risk of disease, which are unlikely to cause multiple cases in families . Statistical modelling of glioma has suggested that much of the heritable risk is polygenic and enshrined in common risk variants, involving the co-inheritance of multiple genetic factors (Figure1.7).
Figure 1.7 Polygenic model of disease susceptibility. The distribution of risk alleles in both cases and controls follows a normal distribution. However, cases have a shift towards a higher number of risk alleles.
Rare, moderately-penetrant disease-causing variants
The “rare variant” hypothesis suggests that a proportion of the remaining heritability of glioma could be due to the combined effect of rare, moderately-penetrant risk alleles . This hypothesis suggests that such variants act independently and confer modest but detectable increases in risk. Studies of rare variants through sequencing of candidate genes in glioma cases and controls have failed to identify genes associated with glioma. A recent study of 1,662 cases and 1,301 controls failed to replicate 52 variants previously identified by candidate gene studies .
Thus in summary, both models of genetic susceptibility have proven to be correct and across all tumour types wide continuums of differing genomic architectures have been observed. For example prostate cancer has a genetic susceptibility predominantly based on common low risk alleles, whereas in ovarian cancer a very substantial proportion is accounted for by rare high penetrance mutations, with the majority of other cancers somewhere in between.
Identification of common low-penetrance allele
The “common disease, common variant” hypothesis posits that a substantial proportion of the genetic risk of common diseases can be accounted for by the action of multiple low-penetrance alleles that have a relatively high population frequency . While each variant may individually cause very modest increases in risk, collectively they could underscore a substantial proportion of disease genetic risk. These alleles are highly unlikely to cause multiple cases in families and therefore would have eluded prior detection through linkage studies .
Genome-wide association studies
Genome-wide association studies (GWAS) emerged in 2005 as a powerful tool for the identification of common genetic markers associated with disease risk. A marker allele is associated with disease if one allele is found significantly more frequently in cases than in disease-free controls. Single nucleotide polymorphisms (SNPs), the marker variants generally used for association studies, are common in the human genome and account for over 90% of all sequence variation . Adjacent SNPs in the genome are not randomly inherited; they are strongly correlated and likely to co-segregate together in a haplotype. The strong correlation of genetically nearby SNPs is termed linkage disequilibrium (LD); the strength of which decreases rapidly with increasing genomic distance . The nature of this haplotype structure allows certain SNPs across the genome to be selected as “tagging SNPs”, which are expected to capture the majority of sequence variation across a given region (Figure 1.7).
Figure 1.8 Tagging SNPs. It is possible to identify genetic variation without genotyping every SNP in a chromosomal region. For example through genotyping SNP 2 it is possible to infer the genotypes of SNP 1, SNP 4 and SNP 7
GWAS arrays typically directly genotype 300,000-1,000,000 tagging SNPs (tag SNPs) across the genome simultaneously. They allow identification of regions associated with a disease or trait (termed “risk loci”) without prior knowledge of genomic location or function. The power of an association study is the likelihood of detecting a true genetic association. The sample size required to yield sufficient power is dependent on the frequency of the disease allele under study, the effect size of the variant on the trait of interest and the significance threshold required to declare a true association. The main advantage of the association design over linkage studies is that single cases are much more readily available than large extended pedigrees. This allows for much larger sample sizes and therefore greater power to detect variants with small effects. Additionally, multiple studies can be combined in a meta-analysis resulting in further increases in power. An alternative approach is to select cases that are genetically enriched for disease, such as those with a family history or early age of disease onset . Since 2005 GWAS have been successfully applied across a broad range of disease types, and the NHGRI-EBI catalogue of published GWAS  currently lists over 13,000 published disease associating SNPs. GWAS have also been extensively applied to cancer, with disease-associated SNPs identified for the majority of tumour types.
Risk SNPs identified through GWAS represent proxies for the association signal but are not themselves necessarily the functional or causative variant at the risk locus. The causative SNP in the association is likely to be correlated with the sentinel tag SNP at the GWAS association peak while not being directly genotyped on a GWAS array. These SNPs can be recovered and the disease risk locus fine-mapped through imputation, which is a computational method that aims to predict the likely genotypes at un-genotyped loci across the genome. This method makes use of the information provided by haplotypes in a reference panel of sequenced samples such as the 1000 Genomes project  and UK10K project  (Figure 1.10). Additionally, a genome-wide approach to imputation can be used to identify new regions of association at variants that are incompletely tagged by GWAS tag SNPs or at insertion/deletions (indels) that are not fully captured by GWAS arrays. This genome-wide imputation approach has been successfully implemented in a recent study which identified rare variants in BRCA2 and CHEK2 with a large effect on lung cancer risk (OR>2.4) . Imputation is limited by the choice of reference panel, the quality and size of which can impact on imputation fidelity. Therefore robust methodological practices are required to avoid erroneous associations, however when conducted correctly imputation can be a valuable tool in risk loci discovery .
Genetic susceptibility to glioma
Association studies in glioma
Outside of the work detailed in this thesis, fourteen glioma susceptibility loci have been identified in European populations (Table 1.7
Table) .In 2009 Shete et al carried out the first glioma GWAS  that comprised a discovery case-control series of UK and European-American individuals (totalling 1,878 cases and 3,670 controls) and replication series of French, German and Swedish individuals (totalling 2,545 cases and 2,953 controls). This study identified five susceptibility loci at 5p15.33, 8q24.21, 9p21.3, 11q23.3 and 20q13.33 . The loci at 9p21.3 and 20q13.33 were independently confirmed by Wrensch et al  in a contemporaneous study of European-American individuals comprising a discovery phase of 692 high-grade glioma cases and 3,992 controls as well as a replication phase of 176 high-grade glioma cases and 174 controls . In 2011, a GWAS carried out by Sanson et al , making use of data from the UK and European-American studies previously reported by Shete et al  as well as two additional case-control series from France and Germany (totalling 4,147 cases and 7,435 controls). This study identified 7p11.2 as a susceptibility locus for glioma, which contained two statistically independent SNP associations with glioma risk . In 2014 a GWAS was carried out by Walsh et al  comprising a UK and European-American discovery series of 1,013 high-grade glioma cases and 6,595 controls (in part overlapping with the study of Wrensch et al ), as well as a European-American replication series of 631 GBM cases and 1,141 controls. This study reported a novel glioma risk locus at 3q26.2 (near TERC) .
Most recently Kinnersley et al  performed a meta-analysis of GWAS data previously generated on four non-overlapping case–control series of Northern European ancestry, totalling 4,147 cases and 7,435 controls (comprising the previous data; the UK-GWAS , the French-GWAS , the German-GWAS  and the US-GWAS ). The study led to the identification of additional susceptibility loci at 12q23.33, 10q25.2, 11q23.2, 12q21.2 and 15q24.2 and taking the total count of risk loci to 12 . Intriguingly across all of the four GWAS data sets the authors did not replicate the association between rs1920116 (near TERC) at 3q26.2 and risk of high-grade glioma recently reported by Walsh et al.
In addition to this, a sequence-based association study in the Icelandic population led to the discovery of 17p13.1 (TP53) as a risk locus for several cancers including glioma. The association with glioma was confirmed in an independent European study . To refine the association signal at 8q24.21 in glioma, the region was fine-mapped by sequencing as well as statistical imputation of pre-existing GWAS datasets. This led to the identification of rs55705857 as being responsible for the 8q24.21 glioma association, with the SNP exhibiting a much larger effect size than the initial GWAS tagSNPs and being highly restricted to low-grade IDH mutated glioma .
Table of contents :
Table of contents
List of abbreviations
List of figures
List of tables
1 CHAPTER 1
1.1 Overview of central nervous system (CNS) tumours
1.1.1 Histological classification of glioma
1.1.2 Epidemiology of glioma
1.2 Molecular classification of glioma
1.2.1 Molecular model of glioma development
1.3 Clinical and biological aspects of glioma
1.3.1 Glioma origins
1.3.2 Prognosis of glioma
1.3.3 Treatments of glioma
1.4 Genetic architecture of susceptibility to cancer
1.4.2 Multi-locus/multi-allele hypothesis
1.4.3 More recent models of genetic susceptibility to glioma
1.5 Identification of common low-penetrance allele
1.5.1 Genome-wide association studies
1.6 Genetic susceptibility to glioma
1.6.1 Association studies in glioma
1.6.2 Perspectives from glioma GWAS
1.7 Strategies to identify novel glioma susceptibility alleles
1.7.1 GWAS, Imputation and meta-analysis
1.7.2 Next-generation arrays
1.7.3 Functional annotation of risk SNPs
1.8 Study aims and scope of enquiry
2 CHAPTER 2
2.1 Subjects and samples
2.1.2 Germline gliomas cases controls samples
2.1.3 Anaplastic oligodendroglioma matched tumour/normal samples
2.2 Molecular methods
2.2.1 Illumina whole-exome sequencing
2.2.2 Illumina transcriptome sequencing (RNA-seq)
2.3 Statistical and bioinformatics methods
2.3.1 General statistical methods
2.3.2 General Bioinformatics techniques
2.3.3 Methods for genome-wide association studies
2.3.4 Methods for functional analysis of genomic data
2.3.5 Annotation of regulatory elements
2.3.6 Methods for somatic genomic analysis
2.3.7 Plotting tools
2.3.8 Survival analysis
3 CHAPTER 3
4 CHAPTER 4
4.1 Overview and rational
4.2.1 Patients, samples and datasets
4.2.2 Statistical analysis
5 CHAPTER 5
5.1 Overview and rational
5.2.1 Patients, samples and datasets
5.2.2 Statistical and bioinformatics analysis
6 CHAPTER 6
6.1 Glioma inherited predisposition
6.2 Somatic genetic studies of Anaplastic Oligodentroglioma OA
6.3 Overall conclusion