Statistical tests for conditional independence
The conditional independence tests performed during the pruning procedure of constraint-based methods, and in particular during the iteration step of the PC algorithm (Algorithm 1, line ), usually rely on 2 or G2 independence tests.
The decision of removing the edge between two variables Xi and Xj conditioned on a set of variables Xk is made through the confrontation of the following hypothesis: H0 The variables Xi and Xj are independent conditioned on Xk Ha The variables Xi and Xj are not independent conditioned on Xk.
Chi-square conditional independence test
A 2 independence test can be used to accept or reject the null hypothesis, i.e. the independence between two discrete variables, Xi and Xj , conditioned on a set of discrete variables, Xk (where Xi has ri levels, Xj has rj levels, and there exists qk combinations of levels for the set Xk).
The test confronts the following models:
the independence model: pi = P(XijXk)P(Xj jXk.
the observed model: po = P(Xi;Xj jXk).
The model po is represented by Nijk, the number of data points with fXi = xi,Xj = xj and Xk = xkg (with xk the kth state of the set Xk). The model pi is represented by NikNjk Nk , where Nik corresponds to the number of data points with fXi = xi and Xk = xkg, Njk corresponds to the number of data points with fXj = xj and Xk = xkg, and Nk corresponds to the number of data points with fXk = xkg.
Variations on the PC algorithm
When an exact list of conditional independence relationships is given as input, the PC or the IC algorithms are guaranteed to learn the CPDAG of the underlying graphical model. However in practice, constraint-based methods should rely on statistical independence tests (subsection 3.1.4), to ascertain the conditional independencies from observational data.
As previously stated [Colombo and Maathuis, 2014, Dash and Druzdzel, 1999], the use of a statistical test on nite datasets makes the constraint-based methods not robust to sampling noise in nite datasets. Indeed, early errors in removing edges from the complete graph typically trigger the accumulation of compensatory errors later on in the prunning process and this cascading eect makes the constraintbased approaches sensitive to the adjustable signicance level required for the conditional independence tests. In addition, traditional constraint-based methods are not robust to the order in which the conditional independence tests are processed, possibly leading to a wide range of dierent results, in particular in highdimensional settings. This prompted recent algorithmic improvements intending to achieve order-independence, as the PC-stable algorithm proposed by [Colombo and Maathuis, 2014], and tested on experimental observational data from yeast gene expression.
Besides, the PC algorithm, as well as other structure discovery approaches, requires specic assumptions to guarantee that the output would be the representative of the markov equivalent class of the underlying graphical model. Among these assumptions, the causal suciency plays an important role as it assumes the absence of latent and selection variables. Yet in practice, the system under observation typically contains unmeasured common causes and selection variables (section 2.2.1).
The FCI (`Fast Causal Inference’) algorithm [Spirtes et al., 1999, 2000], and the more ecient RFCI (`Really Fast Causal Inference’) algorithm [Colombo et al., 2012], have been proposed to enable the discovery of the Partial Ancestral Graph (PAG) (section 18.104.22.168), representative of the Markov equivalence class of DAGs when accounting for hidden and selection variables. The following subsections detail and discuss the PC-stable, the FCI and the RFCI algorithms.
PC-stable: an order-independent constraint-based approach
The dependence on the order in which the variables are given impedes the skeleton learning and the edge orientation steps, in particular with high dimensional settings. [Colombo et al., 2012] proposed a modication of the PC algorithm that removes the order-dependence on the input variables while discovering the skeleton. [Ramsey et al., 2006] and [Colombo et al., 2012] also proposed complementary orientation procedures to reach order-independence during the orientation/propagation steps.
Order-independent skeleton learning
The order-dependence of the PC algorithm is related to the removal of the non- essential edge (Algorithm 1, lines ) within one level ` (Algorithm 1, lines [8 16]), where ` corresponds to the size of the separation set. Indeed, removing an edge at a given level ` possibly modies the adjacency sets of the remaining edges not yet tested for conditional independence, and thus in uences the outcome of the following statistical tests at this level. In practice, mistakes can occur when checking for conditional independence using a statistical test and lead to erroneous skeleton and separation sets. As the order in which the input variables are considered can imply dierent errors, a wide range of skeletons can be expected, in particular for high dimensional settings as shown in [Colombo and Maathuis, 2014]. [Colombo and Maathuis, 2014] proposed an order-independent version of the skeleton discovery step for the PC algorithm, called PC-stable, where the edge removal is not done within a specic level `. Instead, the PC-stable algorithm stores all the possible separation sets found for a given level ` and removes the edges corresponding to the conditional independence only before reaching the next level (or eventually before ending the algorithm). This postponed edge removal between successive levels, rather than within one level, allows for an order-independent discovery of the skeleton, as the edges under statistical test for a level ` do not in uence each others adjacency sets when removed.
As detailed in section 6.3.2, the 3o2 inference approach proceeds on a list of triples ordered by a rank (Eq. 6.32) that optimizes the likelihood that a variable Z is contributing to the relationship between two other variables, X and Y . Asthe 3o2 inference procedure relies on this rank, the skeleton produced is not dependent on the order in which the input variables are processed.
Order-independent orientation steps
As exemplied in [Colombo et al., 2012], the information contained in the separation sets are also order-dependent, as some separating sets may hold by mistake at some level . When used for the orientation of the v-structures, these spurious separation sets may lead to incorrect orientations depending on the order in which the given variables are processed.
[Ramsey et al., 2006] proposed a conservative rule to remove the order-dependence of this orientationstep. [Colombo and Maathuis, 2014] proposed a similar but less conservative approach. Both methods rely on the identication of unambiguous triples, and dier in the denition of the unambiguity. These approaches rst collect, for each unshielded triples hXi;Xk;XjiXi 6 Xj in the reconstructed skeleton C, all the separation sets fYg among the variables belonging to adjC(Xi) and adjC(Xj), such that Xi ?? Xj jY. For the conservative approach [Ramsey et al., 2006], hXi;Xk;XjiXi 6 Xj is said unambiguous if at least one separation set Y can be found, and Xk belongs to all or none of the separation sets fYg. When the triple is unambiguous, it is oriented as a v-structure if and only if Xk is in none of the separation sets fYg. By contrast, [Colombo and Maathuis, 2014] rely on a majority rule to decide for unambiguity. hXi;Xk;XjiXi 6 Xj is said unambiguous if at least one separation set Y can be found, and Xk is not in exactly 50% of the fYg. When the triple is unambiguous, it is oriented as a v-structure if and only if Xk is in less than half of the separation sets fYg. For both methods, only the unambiguous triples are considered in the orientation propagation step. The algorithms resulting from the association of the PC-stable step and the conservative or the majority rule orientation step are called respectively CPC-stable and MPC-stable.
Causal Inference with latent and selection variables
As introduced in the section 2.2.1, unmeasured variables such as latent or selection variables, cause complications when learning a causal graphical model from the observed variables. The FCI (`Fast Causal Inference’) algorithm [Spirtes et al., 1999, 2000] allows the discovery of the representative of the Markov equivalence class of DAGs, from observed conditional independence information, while allowing for the presence of latent and selection variables.
Algorithm 2 gives the details of the FCI algorithm. First, the discovery of the skeleton with the PC algorithm (Algorithm 2, step 1.) and the identication of the v-structures (Algorithm 2, step 2.) are performed. Then, complementary conditional independence tests, for which the conditioning set of variables is search only in the Possible-D-SEP (see section 22.214.171.124), are processed (Algorithm 2, step 3.). Before applying the nal orientation rules (Algorithm 2, step 4.), the edges of the skeleton are all set to . Finally, the propagation step follows the rules R1 to R10 established in [Zhang, 2008a]. The returned graphical model C, akin to a PAG [Zhang, 2008b], is said maximally informative.
Table of contents :
1.1 Expansion of `dangerous’ gene families in vertebrates
1.2 Learning causal graphical models
1.2.1 The state-of-the-art approaches
126.96.36.199 Constraint-based methods
188.8.131.52 Search-and-score methods
1.2.2 3o2: a novel network reconstruction method
2 Graphical Notation and Terminology
2.1 Directed Acyclic Graphs
2.1.1 Graphical model terminology
2.1.2 Interpretation of a DAG
184.108.40.206 The d-separation criterion
220.127.116.11 Markov equivalence
2.2 Ancestral Graphs
2.2.1 Complications from latent and selection variables
18.104.22.168 Spurious correlations
22.214.171.124 Spurious causal eects
2.2.2 Graphical model terminology
2.2.3 Interpretation of a MAG
126.96.36.199 The m-separation criterion
188.8.131.52 Markov equivalence
184.108.40.206 Finding adjacencies in a PAG
3 Constraint-based Methods
3.1 The PC algorithm
3.1.1 Learning the skeleton and sepsets
3.1.2 Orientating the skeleton
3.1.3 Propagating the orientations
3.1.4 Statistical tests for conditional independence
220.127.116.11 Chi-square conditional independence test
18.104.22.168 G-square conditional independence test
3.2 Variations on the PC algorithm
3.2.1 PC-stable: an order-independent constraint-based approach
22.214.171.124 Order-independent skeleton learning
126.96.36.199 Order-independent orientation steps
3.2.2 Causal Inference with latent and selection variables
188.8.131.52 The FCI algorithm
184.108.40.206 Improvements of the FCI algorithm
4 Bayesian methods
4.1 Learning a Bayesian Network
4.2 Scoring funtions
4.2.1 Bayesian scores
4.2.2 Information-theoretic scores
220.127.116.11 The LL scoring function
18.104.22.168 The MDL/BIC criterion
22.214.171.124 The NML criterion
4.3 The search-and-score paradigm
4.3.1 The exact discovery
4.3.2 The heuristic search approaches
4.4 Hybrid approaches
5 Information-theoretic methods
5.1 Information-theoretic measures
5.1.1 The Shannon entropy
5.1.2 The data processing inequality
5.1.3 The multivariate information
5.1.4 The cross-entropy of a Bayesian network
5.2 State-of-the-art of information-theoretic methods
5.2.1 The Chow & Liu approach
5.2.2 Relevance Network
5.2.4 Minimum Redundancy Networks
5.2.5 The MI-CMI algorithm
5.3 Hybrid approaches using Mutual Information Tests
6 3o2: a hybrid method based on information statistics
6.1 Information theoretic framework
6.1.1 Inferring isolated v-structures vs non-v-structures
6.1.2 Inferring embedded v-structures vs non-v-structures
6.1.3 Uncovering causality from a stable/faithful distribution
126.96.36.199 Negative 3-point information as evidence of causality
188.8.131.52 Propagating the orientations
6.2 Robust reconstruction of causal graphs from nite datasets
6.2.1 Finite size corrections of maximum likelihood
6.2.2 Complexity of graphical models
6.2.3 Probability estimate of indirect contributions
6.3 The 3o2 scheme
6.3.1 Robust inference of conditional independencies
6.3.2 The 3o2 algorithm
184.108.40.206 Reconstruction of network skeleton
220.127.116.11 Orientation of network skeleton
6.4 Extension of the 3o2 algorithm
6.4.1 Probabilistic estimation of the edge endpoints
6.4.2 Allowing for latent variables
6.4.3 Allowing for missing data
6.5 Implemented pipelines
6.5.1 The 3o2 pipeline
6.5.2 The discoNet pipeline
7 Evaluation on Benchmark Networks
7.1 Using simulated datasets from real-life networks
7.2 Using simulated datasets from simulated networks
7.3 Using simulated datasets from undirected networks
7.3.1 Generating datasets from undirected networks
7.3.2 Reconstructing simulated undirected networks
8 Interplay between genomic properties on the fate of gene dupli- cates with the 3o2 method
8.1 Enhanced retention of ohnologs
8.1.1 Enhanced retention of `dangerous’ ohnologs
8.1.2 Retained ohnologs are more `dangerous’ than dosage balanced
8.2 Going beyond simple correlations in genomic data
8.2.1 The Mediation framework in the context of causal inference
8.2.2 The Mediation analysis applied to genomic data
8.3 Retention of `dangerous’ ohnologs through a counterintuitive mechanism
8.4 Reconstructing the interplay of genomic properties on the retention of ohnologs
8.4.1 Genomic properties of interest
8.4.2 Non-adaptive retention mechanism supported by the 3o2 causal models
9 Reconstruction of zebrash larvae neural networks from brain- wide in-vivo functional imaging
9.1 The zebrash calcium functional imaging
9.1.1 The calcium functional imaging
9.1.2 The zebrash as vertebrate model
9.1.3 The SPIM technique
9.1.4 The experimental setup
9.1.5 From uorescent signal to neuron activity
9.2 Neural network reconstructions with the 3o2 method
9.2.1 Preprocessing of SPIM images by neuron clustering
9.2.2 Reconstruction of temporal activity patterns
18.104.22.168 Alternating neural activity in optokinetic reponse
22.214.171.124 Neural activity related to hunting behaviour .
10 Reconstruction of the hematopoiesis regulation network with the 3o2 method
10.1 The hematopoiesis regulation network
10.2 Regulation network reconstruction with the 3o2 method
10.2.1 Dataset and interactions of interest
10.2.2 The interactions recovered by 3o2 and alternative methods
11 Reconstruction of mutational pathways involved in tumours pro- gression with the 3o2 method
11.1 The breast cancer dataset
11.1.1 A brief overview of breast cancer
11.1.2 The issue of missing values
11.2 3o2 mutational pathways in breast cancer
11.2.1 Information content of complete vs. incomplete datasets .
11.2.2 3o2 cascades of mutations in breast cancer
11.2.3 Temporal cascade patterns of advantageous mutations .
A Complementary evaluations on real-life networks
A.1 Evaluation of the PC method by signicance level
A.2 Evaluation of the Aracne reconstruction method
A.3 Evaluation of the Bayesian methods by score
A.4 Evaluation of 3o2 by score
A.5 Evaluation of 3o2 against Bayesian and MMHC methods
A.6 Execution time comparisons
B Complementary evaluations on simulated networks
B.1 Description of the benchmark networks
B.2 Evaluation of 3o2 by score
B.3 Evaluation of the PC method by signicance level
B.4 Evaluation of the MMHC method by signicance level
B.5 Comparison between 3o2, PC and Bayesian hill-climbing
B.6 Evaluation of the Bayesian methods by score