Normalization of DNA copy number proles – Project topics materials

Get Complete Project Material File(s) Now! »

Methods for the analysis of DNA copy number proles

TNBC have a very high rate of chromosomal gain and loss. These genomic alterations can be measured using various technologies such as CGH and SNP arrays and next-generation sequencing. The correct detection of these numerous alterations is important as we hope to identify tumor suppressor genes in frequently lost regions and oncogenes in frequently gained regions. The biostatistical analysis and biological interpretation of this kind of data is dicult for several reasons. As for all microarray technologies, measurements are in uenced by various non-relevant factors (for example the probes GC-content) and there is a need for ecient normalization methods. In collaboration with Philippe Hupe (Ph.D.), I worked on the normalization of Aymetrix SNP arrays and proposed a new method: ITALICS (Rigaill et al. (2008, 2007)). We have shown, at the time of the study, that ITALICS outperforms existing methods in terms of signal to noise ratio and enable a better classication of true recurrence and primary on a breast cancer data set (Bollet et al. (2008)). Moreover, for TNBC due to the many genomic rearrangements, recovering the ploidy of tumors is an important and dicult issue that we took into account in collaboration with Tatiana Popova (Ph.D., Popova et al. (2009)) from the group of Marc-Henri Stern (MD/Ph.D., IC).
Both CGH and SNP proles are modeled as a succession of regions sharing the same copy number or LOH status. These regions are delimited by change-points or breakpoints corresponding to chromosome rearrangements. These proles are usually analyzed using multiple change-points and segmentation methods. Most segmentation methods return a single segmentation, characterized by a set of breakpoints. Their qualities are rarely questioned. However, for an n-point prole there are 2n􀀀1 possible segmentations, thus picking one segmentation out of so many is obviously a dicult task. To make a valid biological interpretation we would like to be sure that the best segmentation is by far the best t to the data. If it is not the case we would like to check that the second best, third best and more generally other good segmentations do not have a completely dierent set of change-points. I have been working on this problem with Emilie Lebarbier (Ph.D.) and Stephane Robin (Ph.D.) and proposed new algorithms and statistical tools (Rigaill et al. (2010c,d)) to assess and take into account the uncertainty of change-point estimation. From these algorithms and statistical tools we derive exact formulation of model selection criteria (to select the number of breakpoints) that used to be asymptotically approximated.

Triple Negative and Basal-like breast cancers

Triple Negative Breast Cancers (TNBC) are immunohistochemically characterized by the absence of ER and progesterone receptors (PR) and the lack of HER2 overexpression. Due to its aggressiveness, poor prognosis and lack of targeted therapy, these particular tumors are the focus of many research studies. Although the match is not perfect, there is a good correspondence between TNBC and basallike tumors. Basal-like tumors were identied based on the hierarchical clustering of IDC-NST gene expression proles while TNBC can be either IDC-NST or one of the special histological types. Overall, the exact denition of Basal-like tumors in comparison to TNBC and the use of the term \basal » is still subject to debate (Gusterson et al., 2005; Gusterson, 2009; Moinfar, 2008). Indeed, no consensus has been reached to identify this group using immunohistochemistry (Rakha et al., 2008; Reis-Filho and Tutt, 2008). In the Curie-Servier dataset, Basal-like tumors were identied as ER-, PR-, lack of HER2 overexpression IDC-NST tumors that express either cytokeratin 5/6 and/or cytokeratin 14 and/or Epidermial Growth Factor Receptor (EGFR). In the following, I will use both \TNBC » and \Basal-like » names, even though they are not strictly equivalent, to describe IDC-NST breast tumors that have a basal or TNBC related pattern. Overall TNBC have high histological grades with a high mitotic index and they frequently harbor central tumor necrosis. These tumors are characterized by an impaired DNA repair process and harbor complex genomic rearrangements and more gains and losses than the luminal subtypes (Chin et al., 2006; Vincent-Salomon et al., 2007). It has also been shown that 85 % of the tumors of patients with BRCA1 mutations have a TNBC immunophenotype (Foulkes et al., 2003). Moreover, TNBC are associated to high levels of various proliferation genes such as Ki-67, and very frequent p-53 mutation (Manie et al., 2009).

READ Modulation of the sympathetic system at the β1-adrenoreceptor level in septic shock

Breast tumors of the Curie-Servier cohort

For the Curie-Servier project breast tumors of Luminal A, Luminal B, ER- / HER2+ and TNBCsubtypes were selected and characterized by a pathologist (Anne Vincent Salomon, M.D./Ph.D.) of the IC using immunohistochemistry (IHC, Anne Vincent Salomon and Marion Richardson, M.Sc.). These tumors were obtained from patients treated at the IC (Biological Resource Center) and contain between 50% and 90% tumor cells. Many features of these tumors were collected such as the size of the tumor and the overall survival of the patients. Additionally normal tissues from mammoplastic surgery were collected by Anne Vincent Salomon and Fabien Reyal (M.D./Ph.D.). Finally, cell-lines characterized as TNBC in Neve et al. (2006) were obtained: 184B5, MDA-MB-436, HCC1143, HCC1187, BT20, HCC1937, MCF-12A, HCC38, Hs 578T, MCF-10A, MDA-MB-468, BT-549, HCC70, MDA-MB-157, MDA-MB-231. All the information on the samples are summarized on Figure 2.3. This means that the dierent subtypes are known before any of our analyses. This information can be used to conrm the groups we nd, and also earlier for experimental design, in particular the information can be taken into account to determine batches and make sure batch eects are not responsible for the dierences we observe between subtypes.

Table of contents :

I Introduction
1 Overview
1.1 Introduction
1.2 Methods for the analysis of DNA copy number proles
1.3 Biostatistical analysis of the transcriptomic Curie-Servier dataset
1.4 Conclusion
2 A small introduction to Triple Negative Breast Cancers
2.1 Breast cancers
2.2 Triple Negative and Basal-like breast cancers
2.3 Breast tumors of the Curie-Servier cohort
II Genomic Analysis
3 Chromosome aberrations
3.1 Some technologies to study genomic rearrangements
3.2 DNA copy number proles of SNP and CGH arrays
3.3 An overview of CGH data analysis
4 Normalization of DNA copy number proles
4.1 Short overview of microarray normalization
4.2 Specicities of tumor DNA copy number prole normalization
4.3 Normalization of Aymetrix Genechip 50K and 250K SNP arrays
4.4 Paper: ITALICS
5 Segmentation of DNA copy number proles
5.1 A piecewise constant model for the analysis of DNA copy number proles
5.2 The CGHseg methodology
5.3 Assessing the quality of a given segmentation
5.4 Paper: Exploration of the segmentation space
5.5 Optimal computational scheme for large DNA copy number proles
5.6 Paper: Pruned dynamic programming for segmentation
6 Analysis of the Curie-Servier Genomic dataset
6.1 Genomic alterations in breast cancers and in TNBC
6.2 Analysis of the genomic Curie-Servier dataset
III Transcriptomic Analysis
7 Introduction
8 Experimental Design
8.1 A small introduction to experimental design
8.2 Design of the transcriptomic experiment
9 Pre-processing
9.1 Probe annotation
9.2 Normalization
10 Exploratory Analysis
10.1 Validation of the pre-processing step
10.2 A robust classication of breast tumors, but no intrinsic gene list?
11 Comparison of TNBC with other tumor types
11.1 Gene by gene dierential analysis
11.1.1 Statistical testing
11.1.2 Other lters
11.1.3 Paper: Frequent PTEN genomic alterations
11.1.4 Paper: Formins regulate tumor cell invasion
11.2 Pathway by pathway dierential analysis
11.2.1 Paper: Reactive oxygen species (ROS) control myobroblast and metastases .
11.2.2 An overview of the Wnt pathway in breast cancers
11.2.3 Transcriptomic statistical analysis of the Wnt pathway
IV Conclusion