Nonparametric density estimation of the RWRE

Get Complete Project Material File(s) Now! »

Nonparametric density estimation of the RWRE

Introduction

Random walks in random environment (RWRE) provide simple models to describe various problems such as the propagation of heat, di usion of matter through a physical medium, or DNA-unzipping experiments. In these situations, the medium is very irregular and irreg-ularities can be modelled by a random environment. The de nition of RWRE involves two ingredients: (i) the environment, which is an i.i.d. sample of some unknown distribution and (ii) the random-walk whose transition probabilities are determined by the environment. This paper considers the problem of recovering the density f of the distribution of the environment of a RWRE on Z based on the observation of a single trajectory of the RWRE.
Since their introduction in [Che67], RWRE have been widely studied in the probability literature; see for example [Zei12] for a recent overview of probabilistic results on RWRE in Z and more generally in Zd. On the other hand, statistical inference for RWRE has emerged only recently with the appearance of RWRE in several statistical models such as DNA-unzipping experiments or DNA-polymerase phenomenon in [AMJR12, HFR09, BBC+07, BBC+06, KSJW02]. In these applications, the main task is usually to recover the envi-ronment itself and, when it is the case, an estimator of can be considered as a preliminary step in the construction of an empirical Bayes estimator.
The problem of recovering was originally considered in [AE04]. They considered random walks in general state spaces and studied an estimator of the moments of the distribution but this estimator had poor statistical performance. In the last few years, [CFLL16, FLM14, FGL14] considered random walks on Z and studied the asymptotic behaviour of the maximum likelihood estimator of in a parametric setting. They proved its consistency and asymptotic normality in the ballistic and sub-ballistic regimes as well as its e ciency in the ballistic regime (see Section 2.2 for details regarding these notions). Consistency results have also been proved in the recurrent regime in [CFLL16] for a slightly di erent estimator and Markovian environments in [ALM15] have also been considered.
It is shown in [DL18] that the beta-moments of the distribution can be estimated consistently from a single trajectory of the RWRE. These moment estimators are also used there to obtain an estimator of the cumulative distribution function (c.d.f.) of the random environment. In the present paper, we use these moment estimators to construct estimators of the probability density.
Recovering the probability density of a distribution from its moment is a classical problem in statistics called moment reconstruction (see [Akh65]). Density estimators of a distribution based on its beta-moments (also referred to as beta-kernel estimators) have already been studied when an i.i.d. sample (Z1; : : : ; Zn) of the unknown distribution is observed (see [Che99, BR03, Mna08] and the references therein). In this direct observation setting, [Che99] estimated f at x 2 [0; 1] by a single properly chosen -moment
f^h(x) = n1 X Bhx +1; 1 hx +1(Zi) ; i=1
where h is the smoothing bandwidth and for any a and b in R+, Ba;b is the beta function
a;b a + b) a 1 b 11
(2.1) B (u) = u (1 u) 06u61 :a) b)
[Che99] showed in particular that this estimator does not have the boundary bias of standard kernel estimator. This beta-kernel density estimator has later been shown to be minimax under regularity assumptions on the density f in [BK10].
The rst contribution of the paper is the construction of the rst non-parametric density estimator of the random environment of a RWRE. The density f of is approximated at every point x 2 (0; 1) by a sequence of properly chosen beta kernels, following [Che99, Mna08]. These -moments are estimated using the estimators introduced in [DL18]. The resulting density estimator depends on a regularization parameter. We propose to select automatically this parameter using the so-called Goldenschluger-Lepski method of [GL08]. This selection step allows to optimize the risk bounds of the preliminary estimators without prior knowledge of the regime of the walk or the regularity of the unknown density (see Section 2.4 for details). The Goldenschluger-Lepski method was used in [BK14] to build adaptive estimators in the i.i.d. setting from the minimax estimators of [BK10].
The second contribution is the derivation of the rst non-parametric risk bounds for any density estimator of f based on the observation of a single trajectory of a RWRE. These are based on preliminary results obtained in [DL18] for the stochastic part of the risk and results on beta-kernel estimators presented in [BR03, Mna08, BK10] for the deterministic part of the risk. Our rates do not match minimax rates of density estimation in i.i.d. setting. Nevertheless, our results outperform those obtained in [CFLL16] in the recurrent regime where only consistency results for very particular densities were derived. The rates hold without prior knowledge on the regime of the walk contrary to those of [CFLL16].
We nally investigate the numerical behaviour of our estimators in a Monte Carlo experi-ment. Our estimator behaves reasonably in ballistic regimes where the chain has linear drift. This emphasizes the di erence between population and individual parameters estimation. It is clear that recovering the environment itself is impossible in the ballistic regime, yet es-timating the law of the random environment is still possible. Related results have already been reported in a parametric setting for example in [FLM14] but it comes perhaps as a surprise that the same behaviour is observed in the non-parametric setting. Both theoretical results and this rst simulations consider the case where the chain is observed until it reaches a xed site n. Depending on the regime, the number of observed steps of the random walk can therefore be very di erent, it is typically O(n) in the ballistic regime and eO( n) in the recurrent regime. Our simulations also illustrate how the quality of estimation deteriorates when going from nearly recurrent regime to nearly ballistic regime. Finally, we implement the Goldenshluger-Lepski algorithm and observe graphically how it performs according to variations in regime or regularity.
The paper is organised as follows. Section 2.2 gathers basic results on RWRE on Z and on the likelihood. Section 2.3 presents the construction of basic estimators. Section 2.4 states the main results : convergence rates for basic estimators and the estimator selected by the Goldenshluger-Lepski method. Section 2.5 presents Monte Carlo experiment supporting our theoretical claims. Proofs of the main results are gathered in Section 2.6.

Random walks in random environment (RWRE)

Let E = (0; 1)Z denote the set of environments, E is endowed with the – eld E = B ((0; 1)) Z, Q generated by the cylinders k2Z Bk where every Bk belongs to B ((0; 1)) and Bk 6= (0; 1) only for a nite number of k. Let (Xt)t2N denote the canonical process of the space ZZ+ endowed with the – eld generated by cylinders. For any ! in E, the random walk in environment ! is the time-homogeneous Markov chain with transition probabilities given for all x and y in Z by* p!(x; y) = 81 x !x if y = x 1
That is to say, given the environment !, the random walk currently at point x in Z, will make a one-unit step to the right with probability !x or to the left with probability 1 !x. Hereafter, the Markov chain is started from 0. The distribution P! of the random walk in the environment ! is called the quenched distribution.
All along the paper, the environment ! = (!x)x2Z is a sequence of i.i.d. random variables with common distribution . The environment ! is distributed according to P = Z. Averaging the quenched probability with respect to the environment distribution, we obtain the annealed distribution P on 2Z N
Expectation with respect to the annealed probability measure P are denoted E . The random walk (Xt)t2N is a Markov chain only conditionally on the environment (i.e. with respect to the quenched distribution P!) but the Markov property fails under the annealed probability measure P . The past history provides information on the environment and can not be neglected. The random walk in i.i.d. random environment on Z is the random sequence (Xt)t2N considered under annealed distribution P .
The asymptotic behaviour of the walk (Xt)t2N depends on the distribution of the ratio (2.2) 0 = 1 !0 : !0 If E [j log 0j] is nite, [Sol75] proved the following classi cation:
(i) if E [log 0] 6= 0, then (Xt)t2N is transient (P -a.s.).
Moreover if E [log 0] < 0, then limt!+1 Xt = +1, P -a.s., the process (Xt)t2N is transient to the right,
(ii) if E [log 0] = 0, then (Xt)t2N is recurrent.
Moreover, lim supt!+1 Xt = +1 and lim inft!+1 Xt = , P -a.s. .
In the transient case, the random walk Xt ! +1, with asymptotic rates proved by [Sol75, KKS75]. Let Tn denote the rst hitting time of n in N,
(2.3) Tn = inf ft 2 N; Xt = ng :
Assuming that that the RWRE is transient E [log 0] < 0, the following classi cation holds.
1. If E [ 0] < 1, then P -a.s.,
Tn ! 1 + E [ 0] :
n 1 E[0]
The RWRE is called ballistic.
2. If E [ 0] > 1, then P -a.s., Tn=n ! +1 and the RWRE is called sub-ballistic.
The uctuations of Tn may be characterized more precisely. Suppose that the distribution of log 0 is non arithmetic (that is the group generated by the support of log 0 is dense in R) and that there exists in (0; 1) such that (2.4) E [ 0 ] = 1 and E 0 log+( 0) < 1 ;
where log+(x) = log(x _ 1). A simple convexity argument shows that if exists then it is unique. The value of determines the asymptotic behaviour of (Xt)t2N. It follows from [KKS75] that
1. if < 1, Tn=n1= and Xt=t converge in P -distribution to a non trivial distribution,
2. if = 1, Tn=(n log n) and (log t=t)Xt converge in P -probability to positive constants,
3. if > 1, Tn=n and Xt=t converge in P -probability to positive constants.
In cases (1) and (2), the random walk is sub-ballistic and in the last case, (Xt)t2N is ballistic, both Tn and Xt grow linearly asymptotically.
When the RWRE is recurrent, the uctuations of (Xt)t2N have been evaluated by [Sin82]: suppose that E [log 0] = 0, E log2 0 > 0 and that the support of 0 is included in (0; 1), then Xt=(log t)2 converges in P -distribution to a non trivial limit (see also [Zei12] for some extensions under relaxed versions of this assumption).

READ Formal Derivation and Stability Analysis of Boundary Layer Models in MHD

Table of contents :

1 Introduction
1.1 Le modele de la MAMA en environnements i.i.d. sur Z
1.1.1 La marche aleatoire simple sur Z
1.1.2 Un modele intermediaire
1.1.3 Le modele de la MAMA en milieux i.i.d
1.2 Quelques reperes sur les MAMA
1.2.1 Une denition plus formelle de la MAMA
1.2.2 Quelques resultats probabilistes
1.2.3 Le processus des sauts a gauche
1.2.4 Processus de branchement en milieu aleatoire
1.2.5 Le lien entre les marches aleatoires en milieu aleatoire et les processus de branchement en milieu aleatoire
1.2.6 Un estimateur des moments par [AE04]
1.2.7 Un estimateur de type maximum de vraisemblance par [CFLL16, CFL+14, FLM14, FGL14]
1.2.8 Un premier estimateur non parametrique par [DL18]
1.3 Contributions
1.3.1 Estimation de la densite de la loi du milieu : Chapitre 2
1.3.2 Breve introduction aux statistiques bayesiennes
1.3.3 Consistance a posteriori de l’estimateur Bayesien : Chapitre 5
1.3.4 Inegalite de concentration pour transformation de cha^nes de Markov avec la propriete de dierences bornees : Chapitre 4
1.3.5 Contr^ole uniforme de la queue de distribution du temps de retour en 0 du processus de branchement en milieu aleatoire : Chapitre 3
1.4 Conclusion et perspectives
2 Nonparametric density estimation of the RWRE
2.1 Introduction
2.2 Random walks in random environment (RWRE)
2.3 Estimator construction
2.4 Main results
2.5 Simulation Study
2.5.1 In uence of the regularity
2.5.2 In uence of the regime
2.5.3 Goldenshluger-Lepski estimator
2.6 Proof
2.6.1 Proof of Proposition 1
2.6.2 Bounding b fM
n and fM in sup-norm
2.6.3 Proof of Theorem 2
2.6.4 Proof of Theorem 5
2.6.5 Proof of Proposition 6
3 First return time of a branching process in random environment
3.1 Setting and main result
3.1.1 Assumptions
3.1.2 The case of RWRE
3.1.3 Main result
3.2 Proofs
3.2.1 Theorem 16
3.2.2 Sketch of proof of Theorem 16
3.2.3 Detailed proof of Theorem 16
3.2.4 A dierent interpretation of the BPIRE
3.2.5 Theorem 22
3.2.6 Sketch of proof of Theorem 22
3.2.7 Detailed proof of Theorem 22
3.2.8 Proof of Theorem 14 through Theorems 16 and 22
3.2.9 Proof of the existence of exponential moments for BPIREG(,0)
4 Concentration inequality for geometrically ergodic Markov chains
4.1 Framework
4.1.1 Markovian setting
4.1.2 Assumptions in Markovian framework
4.2 Main results : Theorems 39 and 40
4.3 Proof of Theorem 39
4.3.1 Sketch of proof of Theorem 39
4.3.2 Intermediate results for proof of Theorem 39
4.4 Proof of Theorem 40
4.4.1 V -geometric ergodicity
4.4.2 Satisfaction of Assumptions M1, M2 and M3
5 Posterior consistency of Bayes estimator of the environment
5.1 RWRE, BPIRE and the Bayesian setting
5.1.1 RWRE
5.1.2 BPIRE
5.1.3 Towards posterior consistency
5.2 Bayesian setting and main results
5.2.1 Assumptions on the prior
5.2.2 Bayesian framework for RWRE
5.2.3 Bayesian framework for BPIRE
5.2.4 Main results : posterior consistency for RWRE and BPIREG(,0)
5.3 Proof of the main results
5.3.1 Analysis : a look into Bayesian techniques
5.3.2 Sketch of proof of Theorem 64
5.3.3 Minoration of the denominator Dn : proof of Proposition 70
5.3.4 Some properties of d and dn
5.3.5 Link between d and dn: proof of Proposition 66
5.3.6 Proof of Proposition 67
5.3.7 Proof of Proposition 68
5.3.8 Proof of Proposition 69
5.3.9 Control of the rst and second kind risks
5.3.10 Final step of the proof of Theorem 64
5.3.11 Proof of Theorem 63
Appendices
A Reminder on transition kernels and Markov chains
B Reminder on covering numbers
C Reminder on stopping times
Bibliography