Normalization of DNA copy number microarrays

Get Complete Project Material File(s) Now! »

Tools for an asymptotic study

A widely used approach in asymptotic statistics (advocated by Pollard [71] for example) is to write a statistic as a functional on the sample paths of a stochastic process in order to break the analysis into two parts: the study of regularity of the functional; the study of the stochastic process as a random element of a space of functions. This idea will be illustrated in Chapter 2, in which we establish Central Limit Theorems for the FDP achieved by a class of FDR controlling procedures.
The classical tool to establish such theorems in Euclidean spaces is the Delta method [105]; we recall its formulation for real-valued random variables.
Theorem 1.4.1 (Delta method [105]). Let (Xm) be a sequence of real- valued random variables, θ ∈ R, and (rm) a sequence growing to +∞ as m → +∞. Assume that: (i) rm (Xm − θ) X, where X is a real-valued random variable; (ii) φ : R → R is differentiable at θ, with derivative φ′(θ). Then we have rm (φ(Xm) − φ(θ)) φ′(θ)X We are interested in more general situations in which Xn lives in the functional space D[0, 1] of cadlag functions on [0, 1], that is, right-continuous functions on [0, 1] with left limits3, and φ maps D[0, 1] to R. The extension of the usual definition of convergence in distribution to the non separable metric space (D[0, 1], k · k∞) turns out to raise measurability issues that we discuss in section 1.6.3.
In the remainder of the present section, we begin by recalling a version of Donsker’s Theorem that extends Assumption (i) of Theorem 1.4.1 to stochastic processes of D[0, 1] (section 1.4.1). Then we define Hadamard differentiability, which provides an extension for Assumption (ii) of Theorem
1.4.1 to normed spaces (section 1.4.2). Finally we show how these tools may be combined to yield a functional Delta method [105] (section 1.4.3).

A continuous time optional sampling theorem.

Proof of Lemma 1.3.6. As p-values under the null distribution are independently, uniformly distributed, item (i) results from a simple counting argument, and (ii) follows from (i). For (iii), recalling that bτ = sup n u ∈ [0, 1], ˆG m(u) ≥ u/α o , we have bτ ≥ t if and only if ˆG m(t) < t/α. As ˆG m(t) only depends on p-values under the alternative, and on p-values under the null which are smaller than t, we have {bτ ≥ t} ∈ Ft, and (iii) is proved. By the definition of the threshold bτ of the BH95 procedure, we have ˆG m(bτ ) = bτ/α, which proves (iv) because ˆG m(bτ ) = Rbτ/m. Theorem 1.6.2 (Optional Sampling Theorem (adapted from [70])). Let {(Xt,Ft) : 0 ≤ t ≤ 1} be a martingale, and let 0 ≤ σ ≤ τ ≤ 1 be stopping times for the filtration, such that almost surely, Xt is right-continuous at σ and τ . Then, almost surely, E[Xτ |Fσ] = Xσ .

Asymptotic properties of threshold procedures

This section provides general results about multiple testing procedures with threshold functions satisfying the following regularity condition: Condition C.1 (Hadamard-differentiability). The threshold function T satisfies T (G) > 0, and is Hadamard-differentiable at G, tangentially to C[0, 1], where C[0, 1] is the set of continuous functions on [0, 1] The threshold function derivative is denoted by ˙ TG. We begin by deriving the asymptotic distribution of the FDP of any multiple testing procedure satisfying Condition C.1 (section 2.3.1). We then define and characterize asymptotic equivalence between multiple testing procedures in terms of Condition C.1 (section 2.3.2). Finally we interpret this Condition in terms of crossing points between the distribution function G of the p-values and the rejection curve (section 2.3.3). 2.3.1. Asymptotic False Discovery Proportion. Condition C.1 makes it possible to use the functional Delta method [105] to derive the asymptotic
distribution of the False Discovery Proportion FDPm(T (ˆG m)) actually achieved by procedure T from the convergence in distribution of the centered empirical processes associated with ˆG 0,m and ˆG 1,m, which is a consequence of Donsker’s theorem [105].

Connection between one- and two-stage adaptive procedures

We have introduced two types of FDR controlling procedures generalizing the BH95 procedure: two-stage adaptive (plug-in) procedures explicitly incorporate an estimate of π0 into the standard BH95 procedure, whereas one-stage adaptive procedures do not explicitly use such an estimate, but still yield tighter FDR control than the BH95 procedure.
We will now investigate connections between one-stage and two-stage adaptive procedures, which naturally appear when using the formalism of threshold functions: with a striking symmetry, the threshold of procedure BR08(λ) may be interpreted as a fixed point of an iterated BKY06(λ) procedure, whereas the threshold of procedure FDR08 may be interpreted as a fixed point of an iterated Sto02(λ) procedure. We provide heuristic reasons for these connections in section 2.5.1; in section 2.5.2 we present general results for the connection between one-stage and two-stage adaptive procedures, and derive consequences for the connection between procedures Sto02(λ) and FDR08 on the one hand, and between procedures BKY06(λ) and BR08(λ) on the other hand.

Table of contents :

R´esum´e
Abstract
Remerciements
R´esum´e en fran¸cais
1. Tests multiples
2. Analyse statistique de donn´ees de puces `a ADN
General introduction
Publications and documents
Part 1. Multiple testing
Chapter 1. Large-scale multiple testing
1.1. Multiple testing situations: historical perspective
1.2. From single testing to multiple testing
1.3. FDR control for multiple testing procedures
1.4. Tools for an asymptotic study
1.5. Contributions
1.6. Proofs
Chapter 2. Asymptotic Properties of FDR controlling procedures
2.1. Introduction
2.2. Background and notation
2.3. Asymptotic properties of threshold procedures
2.4. Results for procedures of interest
2.5. Connection between one- and two-stage adaptive procedures
2.6. Concluding remarks
2.7. Proof of main results
Chapter 3. Intrinsic Bounds to Multiple Testing Procedures
3.1. Introduction
3.2. Background and notation
3.3. Criticality, distribution tails and identifiability
3.4. Estimation of π0
3.5. FDR control in a sparse setting
3.6. Proofs of main results
Part 2. Application to microarray data analysis
Chapter 4. Microarray analysis for cancer resarch
4.1. Cancer and genes
4.2. Microarray data in cancer research
4.3. Statistical issues in microarray data analysis
4.4. Contributions
Chapter 5. Normalization of DNA copy number microarrays
Chapter 6. Learning cooperative regulation networks
Chapter 7. Defining true recurrences among ipsilateral breast cancers
Bibliography