The machine learning for personalized medicine initial training network

Get Complete Project Material File(s) Now! »

Personalized medicine

Personalized medicine is a recently emerging paradigm that consists in administering the best treatment to the patients according to their overall clinical status, life style, environment, and genetic background. In other words, it consists in classifying the patients who are expected to have similar responses in subgroups, and provide the treatment best fitted to each of these subgroups.
Personalized medicine is a term that has been used for several years, but lately a strong claim has risen in part of the scientific community that precision medicine should be used [63] instead. The “personalized medicine” term might indeed be misleading, since the objective is not to create a treatment for each person, but to increase the precision of the diagnosis of the patient, so that we can give the best possible treatment at the most appropriate dose. Precision medicine has gained more and more attention during the last years, not only from the scientific community but also from politicians and the general population. A clear example is the 2015 State of the Union speech, in which the President of the United States Barack Obama announced the Precision Medicine Initiative. The US government has allocated US$215 millions to the initiative in the fiscal year 2016, and is seeking to recruit a cohort of 1 million volunteers during the first year of the project. The objectives of the initiative go from improving the treatments for cancer to the modernization of regulation to match the necessities of this new research and care model. The French government has also announced a plan for the development of precision medicine, and is planning to invest 670 million Euros during the next years. The plan is called France Médecine Génomique 2025.
While personalized medicine takes its roots in the observation that different patients respond differently to the same medication, it is important to note that this difference is greater than that observed for the same individual over his lifetime, or even between monozygotic twins [39]. This implies that genetic factors have an influence in the response of a patient. Unlike other nongenetic factors like age or organ function, these factors remain stable during the patient’s life.

Machine learning approaches for personalized medicine

Machine Learning is a field of study at the intersection of statistics and computer science that aims to build mathematical models of datasets. These models can be used to extract knowledge from a dataset (i.e. learn) and to make predictions on novel data points.
Machine Learning has obtained growing attention in recent years thanks to its successful application to many fields. It is well known for its success in domains such as face recognition, text translation or text-to-speech tasks.
Machine learning is also used in bioinformatics to address many different problems, such as gene expression analysis, gene function prediction, protein structure prediction, or the prediction of interaction between genes, proteins and molecules. More recently, multiple research teams have started focusing their efforts on developing and applying machine learning methods specifically to personalized medicine problems, such as biomarker discovery, survival time prediction, or drug-targetable identification of disease drivers.

The machine learning for personalized medicine initial training network

This PhD thesis was conducted under the framework of the Marie Curie Initial Training Network (ITN) Machine Learning for Personalized Medicine (MLPM). The objective of the ITN is “to educate interdisciplinary experts who will develop and employ the computational and statistical tools that are necessary to enable personalized medical treatment of patients according to their genetic and molecular properties and who are aware of the scientific, clinical and industrial implications of this research”1. In the context of the MLPM ITN, each trainee attended three summer schools and did two different internships. As a trainee, I worked during three months in the Statistical Genetics Group of the Max Plank Institut for Psychiatry in Munich2. During this period, I participated in a metanalysis study for discovering SNPs markers for predicting the fast increase of weight in patients under antidepressant treatments. A second project consisted in a study on the association of a functional microsatellite in TLR2 with Inflammatory Bowel Disease, which has been submitted for publication. During this period I also started the work presented in Chapter 4.

Table of contents :

List of Figures
List of Tables
Contents
1 Introduction
1.1 Context
1.1.1 Adverse effects prediction
1.1.2 Personalized medicine
1.1.3 Machine learning approaches for personalized medicine .
1.1.4 The machine learning for personalized medicine initial training network
1.2 State of the art
1.2.1 Adverse effect prediction
1.2.2 Personalized drug effect prediction
1.2.3 Genome-based personalized drug effect prediction
1.3 Supervised machine learning
1.3.1 Linear models
1.3.2 Kernel approaches
1.3.2.1 Support Vector Regression
1.3.2.2 Gaussian Processes
1.3.3 Artificial neural networks
1.4 Multitask Learning
1.4.1 Artificial neural networks for multitask learning
1.4.2 Linear models for multitask learning
1.4.2.1 Multitask Lasso and Sparse Multitask Lasso .
1.4.2.2 Multi-level Multitask Lass
1.4.3 Kernel approaches for multitask learning
1.5 Contributions of this thesis
2 The Toxicogenetic Dream Challenge
2.1 Data
2.2 Methods
2.2.1 Kernels for chemical compounds
2.2.2 Kernels for cell lines
2.2.3 Kernels for chemicals and cell lines pairs
2.3 Results
2.4 Discussion
2.5 Conclusions
3 The Rheumathoid Arthritis Responder Challenge
3.1 Introduction
3.2 Data
3.3 SNPs Selection
3.4 Results
3.4.1 First phase
3.4.2 Second phase
3.5 Conclusions
4 The Multiplicative Multitask Lasso with Task Descriptors
4.1 Introduction
4.2 Multiplicative Multitask Lasso with Task Descriptors
4.2.1 Theoretical guaranties
4.2.2 Algorithm
4.3 Experiments on simulated data
4.3.1 Simulated data
4.3.2 Feature selection and stability
4.3.3 Prediction error
4.3.4 Results for scarcer simulated data (p=n = 400)
4.4 Peptide-MHC-I binding prediction
4.4.1 Data
4.4.2 Experiments
4.5 Conclusion
4.6 Code
5 The Random Multiplicative Multitask Lasso with Task Descriptors
5.1 Introduction
5.2 Approaches in the single task framework
5.3 Random MMLD and Randomized MMLD
5.4 Experiments on synthetic data
5.5 Arabidopsis thaliana experiments
5.6 Conclusions
6 Conclusion
Bibliography