Assembling and analysis of SPAST-HSP patient cohort
Being a National Reference Centre for Rare hereditary neurological Disorders (CRMR Neurogénétique), the Pitié-Salpêtriere University Hospital and the diagnostic laboratory of its Genetic Department, were a fantastic source of both, clinical and genetic data, gathered since 1993, furthermore expanded taking advantage of the SPATAX network established in 2000 (https://spatax.wordpress.com). A cohort of 842 SPAST-mutated patients was therefore assembled and analysed, setting the bases for the subsequent phases of the study.
In order to have the maximum chances at identifying genetic modifiers, different approaches were carried out simultaneously. It is important to underline that, even though multiple strategies were adopted, and performed at different time periods throughout the years, a common extreme-based study design was maintained during all the steps. That allowed to dissect different aspects of the subject by addressing multiple, but same-ended, questions. To build the most complete picture, data resulting from the different approaches used (genome-wide genotyping, RNA sequencing and Whole Exome Sequencing) were first analysed separately and then, whenever possible, combined.
Functional validation of potential candidate variants/genes was then carried out using diverse tools such as patients Primary Blood Lymphoblasts (PBLs) and Drosophila Dspastin RNAi lines and K467R mutants.
Sequencing, quality control and analysis settings
DNA samples belonging to the 448 selected patients were genotyped using the Infinium® Omni2.5Exome-8 v1.3 and v1.4 BeadChip arrays, provided by Illumina (San Diego, CA, USA). Quality control on the resulting genotype data was performed using PLINK version 1.9b6 (Chang et al., 2015), after merging the markers in common between the two arrays’ versions. Samples presenting discordant sex information or elevated missing data rates were excluded from the analysis, as well as uninformative SNPs, variants characterized by low call rates or failing Hardy-Weinberg equilibrium. To verify that the selected patients belonged to the same ethnic group, ancestry was predicted performing a PCA analysis using Peddy (Pedersen and Quinlan, 2017). A total of 1˙701˙047 SNPs and 434 samples passed the quality controls.
As stated before, data resulting from genome-wide genotyping were aimed at performing both linkage and GWAS analysis, therefore implying a different number of patients and of SNPs considered in the two approaches. To perform genome-wide linkage analysis, only patients and unaffected relatives belonging to the 37 families were taken into consideration (n = 259). In order to take into account SNPs in linkage disequilibrium, data were submitted to pruning, therefore allowing to reach a final number of 41˙444 SNPs ready to be analysed.
When used in the process of new genes’ discovery, linkage analysis allows to follow the transmission of a “disease trait” common to all affected patients and absent in all unaffected relatives. To identify a locus potentially harbouring a gene acting as age at onset modifier, linkage analysis was set up defining as “affected” all the early onset patients (age at onset ≤ 15 years, n = 69), while as “unaffected” all the late onset patients (age at onset ≥ 45 years, n = 40). Patients having an age at onset comprised between 15 and 45 years, as well as unaffected relatives were coded as “unknown”, in order to exclude them from the analysis but, at the same time, to take them into consideration to build familial structures. Linkage analysis was then performed using MERLIN v1.1.2 (Abecasis et al., 2002), using both a non-parametric and a parametric model. In the parametric setting, a dominant model was preferred and multiple disorder frequencies were tested (f = 0.0046, the MAF of p.(Ser44Leu) variant, and f = 0.05).
Whole Exome Sequencing (WES)
Patients DNA was sequenced using Nexetera® Rapid Capture Expanded Exome (Illumina, San Diego, CA, USA), allowing to cover sequences extending up to 62Mb and comprising both coding exons, UTRs and miRNAs. FastQC software (Wingett and Andrews, 2018) was used to check bams files’ quality, determining an 80%-average of covered bases. Given the structure of the exomed population, an extreme-based analysis was performed. GEMINI framework (Paila et al., 2013) was used to filter WES results in order to select variants carried exclusively by early onset patients, and expected to be absent in late onset patients. Since WES patients cohort was mostly composed by discordant parents-offspring pairs, no specific age at onset thresholds were fixed, assigning each patient to the “early” or “late onset” category after comparing them to their discordant partner.
Since deciding not to make any a priori assumption concerning modifier variants frequency, no filters concerning variants’ MAF were used during WES analysis. Furthermore, only variants shared by at least two patients and then detected in genes expressed in the central nervous system were selected using an in-house R script, according to Allen Brain Atlas portal (http://portal.brain-map.org/). Finally, candidate variants were annotated using ANNOVAR in order to select variants potentially having a pathogenic effect (SIFT < 0.5, Polyphen2_HVAR>0,9, CADD_pred>12) that could explain the phenotype.
Two strategies were used to select the most interesting candidate variants. The first one took advantage of genes prioritization, performed by ENDEAVOUR (Tranchevent et al., 2008, 2016), and based on similarities (e.g. concerning coding sequence, gene expression, functional annotation, literature, regulatory information) between the candidate gene and a training set of genes already known to be involved in HSPs onset or predicted interacting with spastin by BioGRID (https://thebiogrid.org/) and STRING (https://string-db.org/) databases. The second approach consisted in selecting variants among those most frequently carried among the selected patients. Sanger sequencing was then used to perform variants segregation in carrier families.
Total RNA was extracted from patients’ Primary Blood Lymphoblasts (PBLs) using Maxwell® RSC simplyRNA Cells kit on Maxwell® RSC extractor (Promega). After tissues homogenization, RNeasy Lipid Tissue kit (Qiagen) was used to extract total RNA from the two samples deriving from brain cortex. RNA quality was checked at Agilent 4200 Tapestation System. RNAs were then sequenced using Illumina NextSeq® 500 High Output Kit v.2 (150 cycles, 400 million reads).
Quality control, as well as gene expression analysis among patients, were performed using GenoSplice Technology (www.genosplice.com). Sequencing, data quality, reads repartition (e.g. for potential ribosomal contamination), and insert size estimation were performed using FastQC (Wingett and Andrews, 2018), Picard tools (http://broadinstitute.github.io/picard/), Samtools (http://samtools.sourceforge.net/) and RSeQC (Wang et al., 2012). Reads were mapped using STARv2.4.0 (Dobin et al., 2013) on the hg19 Human genome assembly. Gene expression regulation study was performed as already described (Noli et al., 2015). For each gene annotated in the Human FAST DB v2016, reads aligning on constitutive regions (and therefore not prone to alternative splicing) were counted. Based on the read counts, normalization and differential gene expression were performed using DESeq2 R package (Love et al., 2014). Genes were considered as expressed if their RPKM (Reads Per Kilobase Million) value resulted greater than 97% of the background RPKM value, based on intergenic regions. When comparing patients versus controls, to account for patients’ heterogeneity, samples were paired according to sampling date, age at sampling and sex. Results were considered as statistically significant when P-values ≤ 0.05 and fold-changes ≥ 1.5.
Factors influencing the disorder severity and progression
We assessed whether the severity of the disorder was influ-enced by the age at onset by dividing the cohort into early onset cases (below the median age of onset of 30 years) and late onset cases (above the median age of onset) (n = 546) and comparing the mean stage of disability at the latest examination. The disability for late onset cases was more severe than that for early onset cases. This was especially true when the duration of the disorder was between 11 and 30 years (3.2 1.16, n = 86 versus 3.8 0.9, n = 88;Mann-Whitney test, P 50.0001) (Supplementary Fig. 2).
We performed a more comprehensive analysis of the pro-gression of SPG4-HSP for 116 patients for whom several follow-up examinations were available. Patients with a slow course had a less severe outcome (4.2 1.3, n = 61 versus 3.3 1.5, n = 54; Mann-Whitney test, P = 0.001), which was not explained by their age at onset (slow course group 25.8 17.4, n = 50 versus fast course group 27.4 16.4, n = 60 Mann-Whitney test, P = 0.6) or geno-types (32% missense versus 68% truncating, P = 0.7). Patients with a more rapidly evolving disorder had a higher frequency of urinary incontinence and lower limb proximal weakness (82% versus 48%, P = 0.01 and 64.7% versus 34.5%, P = 0.01), and were more severely affected, as reflected by their disability scores.
The c.131C4T/p.(Ser44Leu) polymorphism
Eleven patients carried the SPAST exon 1 variant c.131C>T/ p.(Ser44Leu) associated with a major pathogenic mutation. The S44L variant was associated with a significantly lower age at onset (11 16.9, n = 11 versus 29.3 18.6, n = 547; Mann-Whitney test, P = 0.004) and thus a lower age at the first examination (32 22.2, n = 11 versus 47.1 17.9, n = 632; Mann-Whitney test, P = 0.02). There was no differ-ence in the severity of the disorder for patients with the S44L and those without (2.9 1.6, n = 481 versus 3.3 0.94, n = 10, Mann-Whitney test, P = 0.46). Patients with p.(Ser44Leu) and another SPAST mutation showed pure spastic paraplegia with increased tendon reflexes, a reduced sense of vibration at the ankles, a bilateral extension plantar response, and urinary urgency or incontinence. The phenotype was more complex in only two cases and accom-panied by delayed psychomotor development and moderate or severe sphincter disturbance, probably due to the presence of a missense mutation and early onset (see above genotype-phenotype correlation). In one case, cerebral MRI revealed mild cerebellar atrophy.
Table of contents :
Hereditary Spastic Paraplegias (HSPs)
Physiopathology and neuropathology
Diagnosis and treatment
Spastic Paraplegia type 4
Clinical and genetic background
Spastin, a microtubule severing protein
The pursuit of genetic modifiers
Learning from other disorders…
And what about SPAST-HSP modifiers?
Patients and methods – Part 1: Modifiers identification
SPAST-HSP patient cohort
Sequencing, quality control and analysis settings
Whole Exome Sequencing (WES)
Material and methods – Part 2: Modifiers validation
Drosophila mutant lines
Results – part 1: better alone than in bad company?
SPAST-HSP cohort analysis
Genome-wide linkage analysis
Genome-wide association analysis
Whole Exome Sequencing (WES) analysis
Results – part 2: unity is strength!
Conclusions and perspectives
Parodi L, Fenu S, Stevanin G and Durr A. Hereditary spastic paraplegia: More than an upper neuron disease. Rev Neurol (Paris). 2017 May;173(5):352-360.
Parodi L, Coarelli G, Stevanin G, Brice A and Durr A. Hereditary ataxias and paraplegias: genetic and clinical update. Curr Opin Neurol. 2018 Aug;31(4):462-471.
Parodi L, Rydning SL, Tallaksen C and Durr A. Spastic Paraplegia 4.
GeneReviews®[Internet].Seattle WA: University of Washington, Seattle; 1993-2019.