# Hybrid approaches using Mutual Information Tests

Get Complete Project Material File(s) Now! »

## Interpretation of a DAG

The set of conditional independence relationships among the variables V, en-tailed by a DAG G, can be deduced by applying a graphical criterion called the d-separation [Pearl, 2000] (see section 2.1.2.1). If two sets of vertices (Xi; Xj) are not adjacent in a DAG G, they are d-separated by a subset S of the remain-ing variables in G. It follows that Xi and Xj are independent conditional on S, i.e., Xi ? XjjS, in any distribution P over V that factorizes1 according to G. In fact, when assuming that the DAG G has the local Markov property2, we assume that the conditional independence relationships obtained via the d-separation cri-terion at least holds in the distribution P over V. Conversely, if the conditional independencies in P correspond exactly3 to the ones that can be read o from G via the d-separation criterion, P is said to be faithful to G or stable.
1P factorizes according to a DAG G = (V; E) if the joint density distribution of V can be written as p(X1; X2; :::; Xq) = Qq p(Xijpa(G; Xi)), where pa(G; Xi) are the parents of Xi according to G.
i=1.

### The d-separation criterion

This d-separation criterion is tied to the notion of blocking a path in a causal graph-ical model, i.e. stopping the ow of information between two related variables. In particular, if three nodes (Xi; Xk; Xj) are related through a non-v-structure (either as a chain Xi ! Xk ! Xj or a fork Xi Xk ! Xj ), the two extreme nodes are marginally dependent. However, when one knows the value of Xk, learning something about Xi has no e ect on the probability of Xj. Thus, once the value of Xk is known, Xi and Xj become independent. In other words, Xk blocks the path between Xi and Xj.
On the other hand, if (Xi; Xk; Xj) are related through a v-structure (a collider Xi ! Xk Xj), the two extreme variables are, a priori, marginally independent. Yet, Xi and Xj become dependent conditioned on the value of Xk. As an example, we could consider that Xi represents the possibility of an earthquake, Xj denotes the possibility of being burgled and Xk stands for the possible activation of an house alarm (due to an earthquake or a burglary). A priori, no relation exists between the occurrence of an earthquake (Xi) and the fact of being robbed (Xj) (assuming that the integrity of the house is preserved). However, let’s suppose that the activation of the house alarm is acknowledged (i.e., the value of Xk is known). If the occurrence of an earthquake is then ascertained, one can draw the conclusion that the house is (likely) safe and, rather than a burglar, the earthquake has induced the activation of the alarm. This e ect is also known as the explaining away e ect4.
A de nition of the d-separation can be taken from [Pearl, 2000, De nition 1.2.3], De nition 2.1.1. d-Separation A path p is said to be d-separated (or blocked) by a set of nodes Z if and only if 1. p contains a chain i ! m ! j or a fork i m ! j such that the middle node m is in Z, or 2. p contains an inverted fork (or collider) i ! m j such that the middle node m is not in Z and such that no descendant of m is in Z.

#### Complications from latent and selection variables

Two impediments can lead to the inference of spurious relationships when recon-structing a causal structure: the hidden (or latent) variables and the selection variables. The rst ones correspond to unmeasured variables that contribute to observed associations. The second ones correspond to unobserved variables that characterize the subpopulation related to the samples drawn from the population of interest. As exempli ed in section 2.2.1.1 by the variable Sel, the selection bias corresponds to a by design issue. Thus, any observed independence or association should always be regarded as conditional on the selection variables5.

Learning the skeleton and sepsets

In the rst step of the PC algorithm (Algorithm 1, (lines [7 17])), the complete undirected graphical model taken as input is progressively thinned out based on the discovery of conditional independence relationships.
The e ciency of the PC algorithm lies in the organization of the conditional independence queries, Indep(X; Y jfUig) (Algorithm 1, (line [10])). Instead of checking all the conditional independencies, as the SGS algorithm does [Spirtes et al., 2000], the PC algorithm is searching for the separation sets among the remaining adjacent vertices during the skeleton thinning out process, which can lead to a polynomial time execution in graphs of nite degree. First, starting with an empty separation set ‘ = 0, all pairs of variables, (X; Y ), are tested for marginal independence. Whenever such independence is found, X ? Y , the corresponding edge is removed, X 6 Y , and the empty set is stored as separation set, SepXY = SepY X = ;. The value of ‘ is then incremented, and each remaining edge having vertices with an adjacency set of size at least ‘ is tested for conditional independence based on subset fUig adjG(X) n fY g (or fUig adjG(Y ) n fXg). The PC algorithm iterates until all adjacency sets are smaller than ‘. This rst step returns the skeleton of the underlying graphical model and a separation set (possibly empty) for each missing edge.
As will be shown later, a major shortcoming of the PC algorithm is that it is not robust to sampling noise in nite dataset, which leads to the accumulation of errors during the edges pruning procedure.
By contrast, as will be shown in Chapter 6, the 3o 2 learning method uncovers the conditional independencies among the observed variables following a robust iterative approach in order to build the skeleton of the underlying graphical model (section 6.3). In a nutshell, the 3o 2 process iteratively builds the sepa-ration sets guided by a score based on 2-point and 3-point information statistics (Eq. 6.32) until no variable belonging to the separation sets can be found.

Orientating the skeleton

Once the skeleton and the separation sets are known, the v-structures can be identi ed (Algorithm 1, lines [19 21]). As introduced in the subsection 2.1.2.1, if the unshielded triple hXi; Xk; XjiXi 6 Xj is a v-structure, knowing the value of Xk creates a dependence between Xi and Xj, while if hXi; Xk; XjiXi 6 Xj is a non-v-structure, knowing the value of Xk induces a conditional independence between Xi and Xj. Thus, if a separation set between Xi and Xj has been found, and if Xk does not belong to it, then under the faithfulness and the causal su ciency assumptions, one can conclude that hXi; Xk; XjiXi 6 Xj is a v-structure and orient the two edges accordingly.
By contrast, as will be shown in section section 5.1.3, the 3o 2 approach relies on the sign and magnitude of the 3-point information in order to decide whether a unshielded triple could be oriented as a collider (section 6.3.2.2).

PC-stable: an order-independent constraint-based approach

The dependence on the order in which the variables are given impedes the skele-ton learning and the edge orientation steps, in particular with high dimensional settings. [Colombo et al., 2012] proposed a modi cation of the PC algorithm that removes the order-dependence on the input variables while discovering the skele-ton. [Ramsey et al., 2006] and [Colombo et al., 2012] also proposed complementary orientation procedures to reach order-independence during the orientation/prop-agation steps.

Order-independent skeleton learning

The order-dependence of the PC algorithm is related to the removal of the non-essential edge (Algorithm 1, lines [11]) within one level ‘ (Algorithm 1, lines [8 16]), where ‘ corresponds to the size of the separation set. Indeed, removing an edge at a given level ‘ possibly modi es the adjacency sets of the remaining edges not yet tested for conditional independence, and thus in uences the outcome of the following statistical tests at this level. In practice, mistakes can occur when checking for conditional independence using a statistical test and lead to erroneous skeleton and separation sets. As the order in which the input variables are considered can imply di erent errors, a wide range of skeletons can be expected, in particular for high dimensional settings as shown in [Colombo and Maathuis, 2014].
[Colombo and Maathuis, 2014] proposed an order-independent version of the skele-ton discovery step for the PC algorithm, called PC-stable, where the edge removal is not done within a speci c level ‘. Instead, the PC-stable algorithm stores all the possible separation sets found for a given level ‘ and removes the edges cor-responding to the conditional independence only before reaching the next level (or eventually before ending the algorithm). This postponed edge removal between successive levels, rather than within one level, allows for an order-independent discovery of the skeleton, as the edges under statistical test for a level ‘ do not in uence each others adjacency sets when removed.
As detailed in section 6.3.2, the 3o 2 inference approach proceeds on a list of triples ordered by a rank (Eq. 6.32) that optimizes the likelihood that a variable Z is contributing to the relationship between two other variables, X and Y . As the 3o 2 inference procedure relies on this rank, the skeleton produced is not dependent on the order in which the input variables are processed.

Order-independent orientation steps

As exempli ed in [Colombo et al., 2012], the information contained in the separa-tion sets are also order-dependent, as some separating sets may hold by mistake at some level . When used for the orientation of the v-structures, these spurious separation sets may lead to incorrect orientations depending on the order in which the given variables are processed.
[Ramsey et al., 2006] proposed a conservative rule to remove the order-dependence of this orientationstep. [Colombo and Maathuis, 2014] proposed a similar but less conservative approach. Both methods rely on the identi cation of unambiguous triples, and di er in the de nition of the unambiguity. These approaches rst collect, for each unshielded triples hXi; Xk; XjiXi 6 Xj in the reconstructed skeleton C, all the separation sets fYg among the variables belonging to adjC(Xi) and adjC(Xj), such that Xi ? XjjY. For the conservative approach [Ramsey et al., 2006], hXi; Xk; XjiXi 6 Xj is said unambiguous if at least one separation set Y can be found, and Xk belongs to all or none of the separation sets fYg. When the triple is unambiguous, it is oriented as a v-structure if and only if Xk is in none of the separation sets fYg. By contrast, [Colombo and Maathuis, 2014] rely on a majority rule to decide for unambiguity. hXi; Xk; XjiXi 6 Xj is said unambiguous if at least one separation set Y can be found, and Xk is not in exactly 50% of the fYg. When the triple is unambiguous, it is oriented as a v-structure if and only if Xk is in less than half of the separation sets fYg. For both methods, only the unambiguous triples are considered in the orientation propagation step. The algorithms resulting from the association of the PC-stable step and the conservative or the majority rule orientation step are called respectively CPC-stable and MPC-stable.

Abstract
Acknowledgements
1 Introduction
1.1 Expansion of `dangerous’ gene families in vertebrates
1.2 Learning causal graphical models
1.2.1 The state-of-the-art approaches
1.2.1.1 Constraint-based methods
1.2.1.2 Search-and-score methods
1.2.2 3o2: a novel network reconstruction method
2 Graphical Notation and Terminology
2.1 Directed Acyclic Graphs
2.1.1 Graphical model terminology
2.1.2 Interpretation of a DAG
2.1.2.1 The d-separation criterion
2.1.2.2 Markov equivalence
2.2 Ancestral Graphs
2.2.1 Complications from latent and selection variables
2.2.1.1 Spurious correlations
2.2.1.2 Spurious causal eects
2.2.2 Graphical model terminology
2.2.3 Interpretation of a MAG
2.2.3.1 The m-separation criterion
2.2.3.2 Markov equivalence
2.2.3.3 Finding adjacencies in a PAG
3 Constraint-based Methods
3.1 The PC algorithm
3.1.1 Learning the skeleton and sepsets
3.1.2 Orientating the skeleton
3.1.3 Propagating the orientations
3.1.4 Statistical tests for conditional independence
3.1.4.1 Chi-square conditional independence test
3.1.4.2 G-square conditional independence test
3.2 Variations on the PC algorithm
3.2.1 PC-stable: an order-independent constraint-based approach
3.2.1.1 Order-independent skeleton learning
3.2.1.2 Order-independent orientation steps
3.2.2 Causal Inference with latent and selection variables
3.2.2.1 The FCI algorithm
3.2.2.2 Improvements of the FCI algorithm
4 Bayesian methods
4.1 Learning a Bayesian Network
4.2 Scoring funtions
4.2.1 Bayesian scores
4.2.2 Information-theoretic scores
4.2.2.1 The LL scoring function
4.2.2.2 The MDL/BIC criterion
4.2.2.3 The NML criterion
4.3.1 The exact discovery
4.3.2 The heuristic search approaches
4.4 Hybrid approaches
5 Information-theoretic methods
5.1 Information-theoretic measures
5.1.1 The Shannon entropy
5.1.2 The data processing inequality
5.1.3 The multivariate information
5.1.4 The cross-entropy of a Bayesian network
5.2 State-of-the-art of information-theoretic methods
5.2.1 The Chow & Liu approach
5.2.2 Relevance Network
5.2.3 ARACNe
5.2.4 Minimum Redundancy Networks
5.2.5 The MI-CMI algorithm
5.3 Hybrid approaches using Mutual Information Tests
6 3o2: a hybrid method based on information statistics
6.1 Information theoretic framework
6.1.1 Inferring isolated v-structures vs non-v-structures
6.1.2 Inferring embedded v-structures vs non-v-structures
6.1.3 Uncovering causality from a stable/faithful distribution
6.1.3.1 Negative 3-point information as evidence of causality
6.1.3.2 Propagating the orientations
6.2 Robust reconstruction of causal graphs from nite datasets
6.2.1 Finite size corrections of maximum likelihood
6.2.2 Complexity of graphical models
6.2.3 Probability estimate of indirect contributions
6.3 The 3o2 scheme
6.3.1 Robust inference of conditional independencies
6.3.2 The 3o2 algorithm
6.3.2.1 Reconstruction of network skeleton
6.3.2.2 Orientation of network skeleton
6.4 Extension of the 3o2 algorithm
6.4.1 Probabilistic estimation of the edge endpoints
6.4.2 Allowing for latent variables
6.4.3 Allowing for missing data
6.5 Implemented pipelines
6.5.1 The 3o2 pipeline
6.5.2 The discoNet pipeline
7 Evaluation on Benchmark Networks
7.1 Using simulated datasets from real-life networks
7.2 Using simulated datasets from simulated networks
7.3 Using simulated datasets from undirected networks
7.3.1 Generating datasets from undirected networks
7.3.2 Reconstructing simulated undirected networks
8 Interplay between genomic properties on the fate of gene dupli- cates with the 3o2 method
8.1 Enhanced retention of ohnologs
8.1.1 Enhanced retention of `dangerous’ ohnologs
8.1.2 Retained ohnologs are more `dangerous’ than dosage balanced
8.2 Going beyond simple correlations in genomic data
8.2.1 The Mediation framework in the context of causal inference
8.2.2 The Mediation analysis applied to genomic data
8.3 Retention of `dangerous’ ohnologs through a counterintuitive mechanism
8.4 Reconstructing the interplay of genomic properties on the retention of ohnologs
8.4.1 Genomic properties of interest
8.4.2 Non-adaptive retention mechanism supported by the 3o2 causal models
9 Reconstruction of zebrash larvae neural networks from brain- wide in-vivo functional imaging
9.1 The zebrash calcium functional imaging
9.1.1 The calcium functional imaging
9.1.2 The zebrash as vertebrate model
9.1.3 The SPIM technique
9.1.4 The experimental setup
9.1.5 From uorescent signal to neuron activity
9.2 Neural network reconstructions with the 3o2 method
9.2.1 Preprocessing of SPIM images by neuron clustering
9.2.2 Reconstruction of temporal activity patterns
9.2.2.1 Alternating neural activity in optokinetic reponse
9.2.2.2 Neural activity related to hunting behaviour
10 Reconstruction of the hematopoiesis regulation network with the 3o2 method
10.1 The hematopoiesis regulation network
10.2 Regulation network reconstruction with the 3o2 method
10.2.1 Dataset and interactions of interest
10.2.2 The interactions recovered by 3o2 and alternative methods
11 Reconstruction of mutational pathways involved in tumours pro- gression with the 3o2 method
11.1 The breast cancer dataset
11.1.1 A brief overview of breast cancer
11.1.2 The issue of missing values
11.2 3o2 mutational pathways in breast cancer
11.2.1 Information content of complete vs. incomplete datasets
11.2.2 3o2 cascades of mutations in breast cancer
12 Conclusions
A Complementary evaluations on real-life networks
A.1 Evaluation of the PC method by signicance level
A.2 Evaluation of the Aracne reconstruction method
A.3 Evaluation of the Bayesian methods by score
A.4 Evaluation of 3o2 by score
A.5 Evaluation of 3o2 against Bayesian and MMHC methods
A.6 Execution time comparisons
B Complementary evaluations on simulated networks
B.1 Description of the benchmark networks
B.2 Evaluation of 3o2 by score
B.3 Evaluation of the PC method by signicance level
B.4 Evaluation of the MMHC method by signicance level
B.5 Comparison between 3o2, PC and Bayesian hill-climbing
B.6 Evaluation of the Bayesian methods by score
Bibliography

GET THE COMPLETE PROJECT