gene networks as graphs and a priori-based optimization

Get Complete Project Material File(s) Now! »

Upcoming communications, submitted and in progress

⋆ A. Pirayre, C. Couprie, F. Bidard, L. Duval and J.-C. Pesquet BRANE Cut : optimisation de graphes avec a priori pour la s´election de g`enes dans des r´eseaux de r´egulation g´en´etique Submitted (April 2017) to colloque GRETSI, Juan-les-Pins, France, 5-8 September, 2017.
⋆ A. Pirayre, D. Ivanoff, L. Duval, C. Blugeon, C. Firmo, S. Perrin, E. Jourdier, A. Margeot and F. Bidard Growing Trichoderma reseei on a mix of carbon sources suggests links between Development and cellulase production Submitted (May 2017) to BMC Genomics.
⋆ Y. Zheng, A. Pirayre, L. Duval and J.-C. Pesquet Joint image and graph recovery and segmentation with variational Bayes and higher-order graphical models (HOGMep) Submitted (May 2017) to IEEE Transactions on Computational Imaging.

Biological prerequisites

A cell phenotype corresponds to an observable characteristic which is driven by the production of some specific proteins, itself driven by the expression of related genes. While some genes are expressed in a constitutive manner, some others depend on external and internal stimuli. This adaptation suggests the presence of gene expression regulatory mechanisms. Before comprehending protein production mechanisms related to a specific phenotype, it is necessary to understand protein origin in cells.
In molecular biology, the central dogma, as well a recurrent controversy (Crick, 1970; Schreiber, 2005; Stearns, 2010), can be formulated as: one gene, one protein. In the genome, a gene is defined — sensu stricto — as a DNA fragment carrying the instructions for making a protein.
This meaningful information is encoded via a specific order of the nucleic bases A, T, C, G: it is the coding sequence which will be transcribed. In addition, a gene is also composed of a promoter containing an initiation sequence as well as regulatory sequences (enhancers and silencers). The promoter is located upstream to the coding sequence. Finally, at the end of the coding sequence, a terminator is found.

Data acquisition and collections

The transcriptome refers to the set of all mRNA expressed in one or a population of cells, in a given experimental condition. Transcriptomic studies require as prerequisites to know where genes are located in the genome. In addition to qualitative information — what genes are expressed? — a transcriptomic study provides quantitative information — in which levels? In transcriptomic, the main postulate suggests that the amount of mRNA reflects the gene activation level and thus the amount of proteins in the studied condition. Hence, producing a set of transcriptomic studies in different experimental conditions allows us to obtain information on condition-dependent gene expression. Due to methodological limitations in transcriptomic data acquisition, comparisons between genes for a given condition cannot be performed. However, expressions over various conditions, for a given gene, may be compared. For instance, it is possible to detect that gene X is more expressed in condition 1 than in condition 2. This is what we call a differential expression analysis. From transcriptomic data and differential expressions, it may thus be possible to infer gene-gene relationships reflecting regulatory mechanisms. Two main approaches produce transcriptomic data: DNA microarrays and, more recently with the advance of high-throughput sequencing, RNA-seq experiments.

READ MOLECULAR AND PHYSICAL CHARACTERISATION OF HEAT TREATED WOOD

Table of contents :

Abstract
Resume
Acronyms
Glossary
1 Introduction
1.1 Context and motivations
1.2 Contributions
1.3 Publications, communications and codes
1.4 Outlines
2 Methodology
2.1 Biological prerequisites
2.2 Data acquisition and collections
2.2.1 DNA microarray principles and data
2.2.2 RNA-seq principles and data
2.2.3 Benchmark data: simulated and real compendium
2.3 Gene expression pre-processing
2.3.1 Biases and normalization
2.3.2 Differential expression and gene selection
2.4 Gene Regulatory Network (GRN) inference
3 An overview of related works in GRN inference
3.1 GRN inference methods
3.1.1 Metric-based inference
3.1.2 Model-based inference
3.1.3 Ancillary inference methods
3.2 Evaluation methodology
3.2.1 Datasets and methods
3.2.2 Inference metrics and databases
3.2.3 Clustering metrics and databases
3.3 Graph optimization and algorithmic frameworks
3.3.1 Optimization view point for edge selection
3.3.2 Maximal flow for discrete optimization
3.3.3 Random walker for multi-class and relaxed optimization
3.3.4 Proximal methods for continuous optimization
3.3.5 Majorize-Minimize (MM) method
4 Edge selection refinement using gene co-regluation a priori (BRANE Cut)
4.1 BRANE Cut: gene co-regulation a priori
4.1.1 Biological a priori and problem formulation
4.1.2 Optimization via a maximal flow framework
4.1.3 Objective results and biological interpretation
4.2 BRANE Cut: application on Trichoderma reesei
4.2.1 Actual knowledge on T. reesei cellulase production system
4.2.2 Dataset and preludes
4.2.3 New insights on cellulase production
4.3 Conclusions on BRANE Cut
5 Edge selection refinement using gene connectivity a priori (BRANE Relax)
5.1 BRANE Relax problem formulation
5.1.1 Gene connectivity a priori
5.1.2 Initial formulation and relaxation
5.2 BRANE Relax: optimization via a proximal framework
5.2.1 Preconditioning
5.2.2 Block-coordinate descent strategy
5.3 BRANE Relax: objective results on benchmark datasets
5.3.1 Numerical performance on DREAM4
5.3.2 Impact of the function
5.3.3 Numerical performance on DREAM5
5.3.4 Speed-up performance
5.4 Conclusions on BRANE Relax
6 Edge selection refinement using node clustering (BRANE Clust)
6.1 Complemental works on joint clustering and inference
6.2 BRANE Clust with hard-clustering
6.2.1 Problem formulation
6.2.2 Optimization framework
6.2.3 Objective results
6.3 BRANE Clust with soft -clustering
6.3.1 Problem formulation
6.3.2 Optimization framework: alternating clustering and inference
6.3.3 Objective results and biological interpretation
6.4 Conclusions on BRANE Clust
7 Joint segmentation and restoration with higher-order graphical models (HOGMep
7.1 Background on inverse problems
7.1.1 Importance of inverse problems
7.1.2 Methodologies for solving inverse problems
7.1.3 Variational Bayesian Approximation theory
7.2 HOGMep: multi-component signal segmentation and restoration
7.2.1 Brief review on image segmentation and/or restoration
7.2.2 Inverse problem formulation and priors
7.2.3 Variational Bayesian Approximation and algorithm
7.3 HOGMep: application to image processing and biological data
7.3.1 Joint multi-spectral image segmentation and deconvolution
7.3.2 Biological application
7.4 Conclusions on HOGMep
8 Conclusions and perspectives
8.1 Conclusions
8.1.1 BRANE strategy: gene networks as graphs and a priori-based optimization
8.1.2 HOGMep for a wide graph-based processing
8.2 Perspectives
8.2.1 Biological-related perspectives
8.2.2 Signal/image-related perspectives
List of figures
List of tables
Bibliography