Uncover Hawkes causality without parametrization

Get Complete Project Material File(s) Now! »

Part II: Uncover Hawkes causality without parametrization

In Chapters III and IV, we study two methods to uncover causal relationships from a multi-variate point process. We focus on one approach per chapter.

Hawkes processes

In order to model the joint dynamics of several point processes (for example timestamps of messages sent by di erent users of a social network), we will consider the multidimensional Hawkes model, introduced in 1971 in [Haw71a] and [Haw71b], with cross-inØuences between the di erent processes. By deÆnition a family of d point processes is a multidimensional Hawkes process if the intensities of all of its components write as linear regressions over the past of the d processes:
∏it = µi + XD Zt ¡i j (t ° s)d Nsj .
k=1 0
Another way to construct Hawkes processes is to consider the following population represen-tation, see [HO74]: individuals of type i , 1 ∑ i ∑ d, arrive as a Poisson process of intensity µi . Every individual can have children of all types and the law of the children of type i of an individual of type j who was born or migrated in t is an inhomogeneous Poisson process of intensity ¡i j (• ° t).
This construction is nice because it yields a natural way to deÆne and measure the causality between events in the Hawkes model, where the integrals
g i j = Z +1 ¡i j (u) du ∏ 0 for 1 ∑ i , j ∑ d.
weight the directed relationships between individuals. Namely, introducing the counting func-tion Nti √j that counts the number of events of i whose direct ancestor is an event of j , we know from [BMM15] that E[d Nti √j ] = g i j E[d Ntj ] = g i j §j d t, (3)
where we introduced §i as the intensity expectation, satisfying E[d Nti ] = §i d t. However in practice, the Hawkes kernels are not directly measurable from the data and these measures of causality between the di erent kinds of events are thus inaccessible.
In the literature, there are main two classes of estimation procedures for Hawkes kernels: the parametric one and the nonparametric one. The Ærst one assumes a parametrization of the Hawkes kernels, the most usual assumes the kernels are decaying exponential, and estimate the parameter via the maximization of the Hawkes log-likelihood, see for example [BGM15] or [ZZS13]. The second one is based either on the numerical resolution of Wiener-Hopf equations which links the Hawkes kernels to its correlation structure [BM14b] (or equivalently on the approximation of the Hawkes process as an Autoregressive model and the resolution of Yule-Walker equations [EDD17]), or on a method of moments via the minimization of the contrast function deÆned in [RBRGTM14].
In Chapters III and IV, we propose two new nonparametric estimation methods to infer the integrals of the kernels using only the integrated moments of the multivariate Hawkes process.
For all estimation procedures mentionned above, including ours, we need the following sta-bility condition so that the process admits a version with a stationary intensity:
Assumption 1. The spectral norm of G = [g i j ] satisÆes ||G|| < 1.

Generalized Method of Moments approach

A recent work [JHR15] proved that the integrated cumulants of Hawkes processes can be expressed as functions of G = [g i j ], and provided the constructive method to obtain these expressions. The Ærst approach we developed in this part is a moment matching method that Æts the second-order and the third-order integrated cumulants of the process. To that end, we have designed consistent estimators of the integrated Ærst, second and third cumulants of the Hawkes process. Their theoretical counterparts are polynomials of R = (I °G)°1, as shown in §i = Ri m µm
Once we observe the process N t for t 2 [0, T ], we compute the empirical integrated cumulants on windows [°HT , H T ], and minimize the squared di erence LT between the theoretical cu-mulants and the empirical ones. We have proven the consistency of our estimator in the limit T ! 1, once the sequence (HT ) satisÆes some conditions. Our problem can be seen as a Generalized Method of Moments [Hal05].
To prove the consistency of the empirical integrated cumulants, we need the following as-sumption:
Assumption 2. The sequence of integration domain’s half-length satisÆes HT ! 1 and HT2 /T ! 0.
We prove in Chaper III the following theorem of consistency.
Result 1. Under Assumptions 1 and 2, the sequence of estimators deÆned by the minimization of LT (R) converges in probability to the true value G:
GbT = I ° arg
min LT (R) °°°°! G
R2£ T !1
The numerical part, on both simulated and real-world datasets, gives very satisfying results. We Ærst simulated event data, using the thinning algorithm of [Oga81], with very di erent ker-nel shape – exponential, power law and rectangular – and recover the true value of G for each kind of kernel. Our method is, to the best of our knowledge, the most robust with respect to the shape of the kernels. We then ran our method on the 100 most cited websites of the MemeTracker database, and on Ænancial order book data: we outperformed state-of-the-art methods on MemeTracker and extracted nice and interpretable features from the Ænancial data. Let also mention that our method is signiÆcantly faster (roughly 50 times faster) since previous methods aim at estimating functions while we only focus on their integrals.
The simplicity of the method, that maps a list of list of timestamps to a causality map between the nodes, and its statistical consistency, incited us to design new point process models of order book and capture its dynamics. The features extracted using our method have very insightful economic interpretation. This is the main purpose of the Part III.

Constrained optimization approach

The previous approach based on the Generalized Method of Moments need the Ærst three cumulants to obtain enough information from the data to recover the d2 entries of G. Assum-ing that the matrix G has a certain structure, we can get rid of the third order cumulant and design another estimation method using only the Ærst two integrated cumulants. Plus, the resulting optimization problem is convex, on the contrary to the minimization of LT above, which enables the convergence to the global minimum. The matrix we want to estimate min-imize a simple criterion f convex, typically a norm, while being consistent with the Ærst two empirical integrated cumulants.
We formulate our problem as the following constrained optimization problem: min f (G)
where f (G) is a norm that provides a particular structure to the solution. Every matrix G satisfying C = (I °G)°1L(I °G>)°1 equals I ° L 1/2 MC°1/2 with M an orthogonal matrix. Instead of the previous problem, we now focus on its convex relaxation, we split the variables
G and M, and solve the problem with the Alternating Direction Method of Multipliers algorithm, see [GM75] and [GM76]:
min d d
f (G) + B(M) + B(G) + (G)
G,M R+£
s.t. G = I °L1/2 M C°1/2,
where B (resp. B) is the open (resp. closed) unit ball w.r.t. the spectral norm. The closed unit ball w.r.t. the spectral norm is indeed the convex hull of the orthogonal group.
On the contrary to the optimization problem of the previous chapter, the problem just stated is convex. We test this procedure on numerical simulations of various Hawkes kernels and real order book data, and we show how the criterion f impact the matrices we retrieve.
3 Part III: Capture order book dynamics with Hawkes processes
Chapter V focus on the estimation of Hawkes kernels’ integrals on Ænancial data, using the estimation method introduced in Chapter III. This in turn allowed us to have a very precise picture of the high frequency order book dynamics. We used order book events associated with 4 very liquid assets from the EUREX exchange, namely DAX, EURO STOXX, Bund and Bobl future contracts.

A single asset 12-dimensional Hawkes order book model

As a Ærst application of the procedure described in Chapter III, we consider the following 12-dimensional point process, a natural extension of the 8-dimensional point process introduced in [BJM16]:
N t = (Tt+, Tt°, L+t, L°t,Ct+,Ct°, Tta , Ttb , Lat , Lbt ,Cta ,Ctb )
where each dimension counts the number of events before t:
• T + (T °): upwards (downward) mid-price move triggered by a market order.
• L+ (L°): upwards (downward) mid-price move triggered by a limit order.
• C+ (C °): upwards (downward) mid-price move triggered by a cancel order.
• T a (T b ): market order at the ask (bid) that does not move the price.
• La (Lb ): limit order at the ask (bid) that does not move the price.
• C a (Cb ): cancel order at the ask (bid) that does not move the price.
We then use the causal interpretation of Hawkes processes to interpret our solution as a measure of the causality between events. This application of the method to this new model revealed the di erent interactions that lead to the high-frequency price mean reversion, and those between liquidity takers and liquidity makers.
For instance, one observes the e ects of T + events on other events on Figure A.1 (in the Ærst columnn on the left). The most relevant interactions are the T + ! L+ and T + ! L°: the latter is more intense and related to the mean-reversion of the price. Indeedn when a market order consumes the liquidity available at the best ask, two main scenarios can occur for the mid-price to change again, either the consumed liquidity is replaced, reverting back the price (mean-reverting scenario, highly probable) or the price moves up again and a new best bid is created.

A multi-asset 16-dimensional Hawkes order book model

The nonparametric estimation method introduced in Chapter III allows a fast estimation for a nonparametric methodology. We then scale up the model so as to account for events on two assets simultaneously and unveil a precise structure of the high-frequency cross-asset dynamics. We consider a 16-dimensional model, made of two 8-dimensional models of the form N t = (Pt+, Pt°, Tta , Ttb , Lat , Lbt ,Cta ,Ctb )
where the dimension P+ (P°) counts upwards (downward) mid-price move triggered by any order.
We compared two couples of assets that share exposure to the same risk factors. The main empirical result of this study concerned the couple (DAX, EURO STOXX) for which price changes and liquidity changes on the DAX (small tick) mainly inØuence liquidity on the EURO STOXX (large tick), while price changes and liquidity changes on the EURO STOXX tend to trigger price moves on the DAX. We ran the estimation procedure on the 16-dimensional model, we focus our discussion on the two non-diagonal 8 £8 submatrices on Figure A.2 that correspond to the interaction between the assets – the subscript D stands for DAX and X for EURO STOXX.
The most striking feature emerging from Figure A.2 is the very intense relation between same-sign price movements on the two assets. Another notable aspect is the di erent e ects of price moves and liquidity changes of one asset on events on the other asset. Price moves on the DAX have also an e ect on the Øow of limit orders on EURO STOXX (PD+ ! LbX and PD+ ! CXa ), whereas EURO STOXX price moves triggers mainly DAX price moves in the same direction (PX+ ! PD+). An important aspect for understanding this result is the di erent perceived tick sizes on the two assets. Note that the e ects observed above can be explained with the notion of latent price [RR10], see Chapter V for further details.
In Chapters III and IV, we study two methods to uncover causal relationships from a multi-variate point process. We focus on one approach per chapter.

Hawkes processes

In order to model the joint dynamics of several point processes (for example timestamps of messages sent by di erent users of a social network), we will consider the multidimensional Hawkes model, introduced in 1971 in [Haw71a] and [Haw71b], with cross-inØuences between the di erent processes. By deÆnition a family of d point processes is a multidimensional Hawkes process if the intensities of all of its components write as linear regressions over the past of the d processes:
∏it = µi + XD Zt ¡i j (t ° s)d Nsj . k=1 0
Another way to construct Hawkes processes is to consider the following population represen-tation, see [HO74]: individuals of type i , 1 ∑ i ∑ d, arrive as a Poisson process of intensity µi . Every individual can have children of all types and the law of the children of type i of an individual of type j who was born or migrated in t is an inhomogeneous Poisson process of intensity ¡i j (• ° t).
This construction is nice because it yields a natural way to deÆne and measure the causality between events in the Hawkes model, where the integrals g i j = Z +1 ¡i j (u) du ∏ 0 for 1 ∑ i , j ∑ d.
weight the directed relationships between individuals. Namely, introducing the counting func-tion Nti √j that counts the number of events of i whose direct ancestor is an event of j , we know from [BMM15] that
E[d Nti √j ] = g i j E[d Ntj ] = g i j §j d t, (3)
where we introduced §i as the intensity expectation, satisfying E[d Nti ] = §i d t. However in practice, the Hawkes kernels are not directly measurable from the data and these measures of causality between the di erent kinds of events are thus inaccessible.
In the literature, there are main two classes of estimation procedures for Hawkes kernels: the parametric one and the nonparametric one. The Ærst one assumes a parametrization of the Hawkes kernels, the most usual assumes the kernels are decaying exponential, and estimate the parameter via the maximization of the Hawkes log-likelihood, see for example [BGM15] or [ZZS13]. The second one is based either on the numerical resolution of Wiener-Hopf equations which links the Hawkes kernels to its correlation structure [BM14b] (or equivalently on the approximation of the Hawkes process as an Autoregressive model and the resolution of Yule-Walker equations [EDD17]), or on a method of moments via the minimization of the contrast function deÆned in [RBRGTM14].
In Chapters III and IV, we propose two new nonparametric estimation methods to infer the integrals of the kernels using only the integrated moments of the multivariate Hawkes process.
For all estimation procedures mentionned above, including ours, we need the following sta-bility condition so that the process admits a version with a stationary intensity:
Assumption 1. The spectral norm of G = [g i j ] satisÆes ||G|| < 1.

Table of contents :

Introduction
Motivations
Outline
1 Part I: Large-scale Cox model
1.1 Background on SGD algorithms, Point Processes and Cox proportional hazards model
1.2 SVRG beyond Empirical Risk Minimization
2 Part II: Uncover Hawkes causality without parametrization
2.1 Hawkes processes
2.2 Generalized Method of Moments approach
2.3 Constrained optimization approach
3 Part III: Capture order book dynamics with Hawkes processes
3.1 A single asset 12-dimensional Hawkes order book model
3.2 A multi-asset 16-dimensional Hawkes order book model
I Background on SGD algorithms, Point Processes and Cox proportional hazards model
1 SGD algorithms
1.1 DeÆnitions
1.2 SGD algorithms from a general distribution
1.3 SGD algorithms from a uniform distribution
1.4 SGD with Variance Reduction
2 Point Processes
2.1 DeÆnitions
2.2 Temporal Point Processes
3 Cox proportional hazards model
3.1 Survival analysis
3.2 Existing methods
II Large-scale Cox model
1 Introduction
2 Comparison with previous work
3 A doubly stochastic proximal gradient descent algorithm
3.1 2SVRG: a meta-algorithm
3.2 Choice of ApproxMCMC
4 Theoretical guarantees
5 Numerical experiments
6 Conclusion
7 Proofs
7.1 Proof of Proposition 1
7.2 Preliminaries to the proofs of Theorems 1 and 2
7.3 Proof of Theorem 1
7.4 Proof of Theorem 2
8 Supplementary experiments
9 Simulation of data
10 Mini-batch sizing
Part II Uncover Hawkes causality without parametrization
III Generalized Method of Moments approach
1 Introduction
2 NPHC: The Non Parametric Hawkes Cumulant method
2.1 Branching structure and Granger causality
2.2 Integrated cumulants of the Hawkes process
2.3 Estimation of the integrated cumulants
2.4 The NPHC algorithm
2.5 Complexity of the algorithm
2.6 Theoretical guarantee: consistency
3 Numerical Experiments
4 Technical details
4.1 Proof of Equation (8)
4.2 Proof of Equation (9)
4.3 Integrated cumulants estimators
4.4 Choice of the scaling coe$cient Σ
4.5 Proof of the Theorem
5 Conclusion
IV Constrained optimization approach
1 Introduction
2 Problem setting
3 ADMM
3.1 The ADMM algorithm
3.2 Convergence results
3.3 Examples
4 Numerical results
4.1 Simulated data
4.2 Order book data
5 Conclusion
6 Technical details
6.1 Convex hull of the orthogonal group
6.2 Updates of ADMM steps
V Order book dynamics
1 Introduction
2 Hawkes processes: deÆnitions and properties
2.1 Multivariate Hawkes processes and the branching ratio matrix G
2.2 Integrated Cumulants of Hawkes Process
3 The NPHC method
3.1 Estimation of the integrated cumulants
3.2 The NPHC algorithm
3.3 Numerical experiments
4 Single-asset model
4.1 Data
4.2 Revising the 8-dimensional mono-asset model of [BJM16] : A sanity check
4.3 A 12-dimensional mono-asset model
5 Multi-asset model
5.1 The DAX – EURO STOXX model
5.2 Bobl – Bund
6 Conclusion and prospects
1 Origin of the scaling coe$cient Σ
Bibliography