Get Complete Project Material File(s) Now! »
Discrete random elds
Putting ourselves in a discrete setup where the space can be regularly gridded such that a value of the studied quantity can be measured at a location x 2 RD, a random eld is dened as the collection of random variables f(xi)gi2f1;:::;ng and associates to a given realisation a probability to occur. Taking the simple example of an image, one can consider that an R- valued random variable is associated to each pixel, making the entire image the set of random variables constituting one realisation of the random eld. In the nite case, the random eld is described by the joint probability density function (pdf) of the n discrete random variables p((x1); : : : ; (x1)).
The cosmological principle, stating that the distribution of matter in the Universe is, at large scales, homogeneous and isotropic (see Sect. 1.1), has a direct implication on the mathematical properties of the considered elds. The rst is statistically expressed as the stationarity of the associated random process yielding that the joint pdf is invariant under translations (i.e. not a function of the spatial index set). The isotropy in turn induces the eld to be invariant under rotation relieving the statistics from any preferential direction. Such invariant elds, widely used in cosmology are also a key element in many mathematical formulations in other elds of physics and applied mathematics. When equipped with additional local properties, they are for instance called Markovian and are at the basis of many developments in image processing such as texture recognition, classication and synthesis [see e.g. Efros & Leung, 1999; Varma & Zisserman, 2009].
In statistics, a common one-point summary of a probability distribution is given by the ensemble average over many realisations. One thing that makes cosmology a special science is that the object of study, the Universe, is the only realisation5 we have access to. By invoking ergodicity, cosmologists however are able to assimilate ensemble and volume averages to extend the statistical properties of the uniquely observed Universe. As an example, the mean matter density in the Universe that we denoted in Sect. 1.2.1 can be seen as the average over a suciently large volume of the observed density, without requiring other samples of the Universe.
Cosmological parameters and matter power spectrum
Note that the six previous cosmological parameters were the fundamental ones describing the CDM but some others are xed or can be derived, including the Hubble constant H0 and the density parameters m, b, and crit already encountered in Sect. 1.1.2. Another common parameter of interest, equivalent to xing the amplitude of the primordial spectrum As, is 8 which corresponds to the variance of the late-time matter uctuations smoothed at a scale of 8 Mpc/h. Among the goals of observational cosmology are to test the validity of the CDM model but also to provide accurate measurements its parameters. To do so, cosmologists rely on statistical representations such as the two-point correlation function of cosmological observables like the galaxy distribution, as discussed in Sect. 1.3. The cosmological parameter values can then be estimated by tting parameters of the model to the observed statistics. By combining the information provided by dierent observables, the community was for instance able to draw a picture of the 3D matter power spectrum matching with precision the one from the linear matter prediction. This is shown in Fig. 1.2 where the solid line represents the best t of CDM linear theory and the coloured points are measurement provided by several probes, like the CMB to constrain the largest scales (low values of k), the galaxy distribution for the intermediate scales [such as data from Reid et al., 2016] and Lyman- clustering [from quasar surveys like Abolfathi et al., 2018] for the smallest scales. These data agree remarkably well with the linear matter power spectrum obtained from the Planck18 cosmology [Planck Collaboration VI et al., 2020], showing again the good agreement between CDM and observations.
Some parameters can however have similar impacts on the two-point matter clustering, like variations in 8 and m, leading to strong degeneracies and invoking the need of combining multiple information to break them. This can be done by using dierent probes like supernovae [see e.g. Abbott et al., 2019] or cluster analysis in Sunyaev Zel’dovich eect [see e.g. Salvati et al., 2018]. In Chapter 6, we shall also see how cosmic web environments can be used to constrain cosmological parameters of the CDM model.
Dark matter only and hydrodynamical simulations
Early developments of simulationswere based on the dark matter only evolution of the density eld including solely the eect of gravity (also called N-body simulations). Starting from a set of initial conditions at very high redshift, usually taken from a Gaussian random eld1, the main idea is to dynamically evolve a set of N particles by solving the Vlasov-Poisson system of equations and iteratively move particles. Solving equations of motion for large N is a computationally heavy task and require N2 operations at each timestep for which several sophisticated techniques emerged allowing a more ecient processing [Efstathiou et al., 1985; Springel, 2005]. The right panel of Fig. 2.1 shows the overdensity eld computed in 2014 from a set of N = 18203 particles by the Illustris collaboration2 showing the evolution made in this domain the past decades.
One of the main achievements of these N-body simulations is to allow the analysis of the density eld at both large and small scales, reproducing accurately its statistics. Figure 2.2 shows for instance the power spectrum computed from one box of the Quijote simulation [Villaescusa-Navarro et al., 2020] and the one predicted by linear theory. We can see the deviation between the two at k 0:15 h/Mpc showing the incapacity of using such theoretical modelling at small scales. In principle, elements can be added to the linear theory to allow the description of smaller scales. Yet they are still limited, reaching the percent accuracy at a mildly-non linear scales of k 0:3 h/Mpc [Carrasco et al., 2012] which is already an achievement but is not sucient to carry out precise cosmological analyses and to understand the physical processes occurring at smaller scales (galaxy evolution, baryonic physics, etc.).
Simulations are thus indispensable for the accurate assessment and analysis of gravitational dynamics at small scales, but also for the development of statistical tools used for the study of future large-volume surveys like Euclid [Laureijs et al., 2011] or Vera Rubin Observatory [Collaboration et al., 2009]. As an example, simulations are particularly interesting to assess how a statistics derived from an observable varies with cosmological parameters, or to build accurate covariance matrices, problems encountered in Chapter 6. For all these reasons, dark matter only simulations strive pushing further the scale limit by using always larger volume and ner resolutions, as for instance Millenium [Angulo et al., 2012], MultiDark [Klypin et al., 2016] or Quijote [Villaescusa-Navarro et al., 2020]. Beyond statistical analysis of the matter distribution, N-body simulations are also particularly interesting to study collapsed objects like halos identied by means of post-processing to rene predictions of their number counts or density proles [Tinker et al., 2008; More et al., 2011].
However, the dark matter is not directly observable and, with the interest of reaching smaller and smaller scales, grew that of including baryonic matter in the simulations. Modelling the complex non-linear interactions between baryons, gas, stars, black holes and dark matter happening at all scales in cosmological volumes is one of the goals of hydrodynamical simulations. For that precise purpose, the inclusion of additional equations of motion is required which increases even more the complexity of the computation, hence requiring tradeo s between mass/volume resolutions and computational time of such simulations. Even with those diculties, many large-scale hydrodynamical simulations were developed to study the role of baryonic physics in the evolution of large-scale structures, like Horizon-AGN [Dubois et al., 2014], EAGLE [Schaye et al., 2015], Illustris [Vogelsberger et al., 2014] or Illustris-TNG [Nelson et al., 2019]. As such, hydrodynamical simulations enlighten our understanding of structure formation and evolution of individual objects like galaxies or stars that can then be tested against astrophysical observations to support or refute the proposed models [Pearce et al., 2001; Dubois et al., 2014; Schaye et al., 2015; Crain et al., 2015; Nelson et al., 2019; Donnari et al., 2019].
The limitations of statistical analyses
Statistical representations of random elds based on the rst orders of poly-spectra decomposition are limited in their representation of non-Gaussian patterns. In particular, most cosmological analyses of the matter distribution are based on the evaluation of the power spectrum which is completely insensitive to the texture of the cosmic web. Since P(k) is only taking into account the modulus of the random eld in the Fourier space, it omits the information contained in the phase. This is illustrated in Fig. 2.5 where the two elds have the exact same power spectrum, yet showing very dierent structural information, easily captured by eye.
The inclusion of a sensitivity to this pattern in the analyses requires the evaluation of higher order statistics becoming already computationally hardly tractable at the order three with the bispectrum and come with a theoretical expression of the uncertainty which involves the fourth moment. The measure of such rst high-order statistics also require many datapoints to be accurate.
The cosmological sensitivity of environments
Since the resulting pattern of the cosmic web is mainly driven by gravitational dynamics, the extraction of quantitative information from the observed structures provides key insights on the underlying cosmological model and enlighten our understanding of dark matter and dark energy. The rst extensive and quantitative analyses of the multi-scale cosmic web in simulations suggests that each individual environment span a broad range of densities [see the right panel from Fig. 2.6, reproduced from Cautun et al., 2014] which in turn advocates for dierent cosmological histories. One could hence expect that individual environments inherit from dierent imprints and may show dissimilar behaviours with respect to cosmological models and parameters. As an example, voids are believed to be pristine environments, only little deformed by gravity and free from complex multi-streaming thus providing a perfect playground for the study of dark energy [Lee & Park, 2009; Lavaux & Wandelt, 2012; Hamaus et al., 2014, 2015; Pisani et al., 2015] or for constraining neutrino mass [Massara et al., 2015]. In the opposite way, clusters are highly non-linear objects with high over-density enclosing a large fraction of the mass for a small part of the volume. The statistics of these peaks (number, distribution with redshift) in the density eld have been shown to be particularly sensitive to some cosmological parameters like the normalisation of the matter power spectrum or the matter density [Bahcall et al., 1997; Bahcall & Fan, 1998; Holder et al., 2001]. They are also unique laboratories to constrain the baryon gas fraction [White & Frenk, 1991; White et al., 1993] and to study the evolution of galaxies [Butcher & Oemler, A., 1984; Baldry et al., 2006]. This relationship between the dierent environments of the cosmic web and the cosmological parameters of the CDM model is an aspect that we will develop in Chapter 6 using the two-point statistics of the dierent environments.
The role of the environment in shaping galaxies and clusters
At the astrophysics level, detecting cosmic structures may also help in proposing scenarios for the formation and evolution of galaxies. The rst hints of environmental eects on galaxies were reported in [Oemler, 1974] showing that the densest regions of the Universe were more likely hosting elliptical than spiral galaxies. These observations were then rened with the recrudescence and availability of web nder algorithms. In particular, the most prominent structure, also traced in observations, is the lamentary part of the pattern. These massive bridges act like highways in the cosmic web, allowing the transport of the matter. In this picture, galaxies escape low-density regions and travel along the network being carried by the ow of matter in laments towards the most massive parts of the web, the nodes [Aragon revealing the lamentary pattern of the cosmic web in data and simulations hence oers the possibility to study the inuence of the environment on the formation and evolution of galaxies.
This topic has received a considerable interest these past years showing many correlations between the physical properties of galaxies (e.g. their mass, shape, luminosity, orientation or ability to form stars) or halos and those of the underlying web or related tidal anisotropies [Kaumann et al., 2004; Hahn et al., 2007; Martinez et al., 2016; Kuutma et al., 2017; Malavasi et al., 2017; Laigle et al., 2018; Ganeshaiah Veena et al., 2018; Sarron et al., 2019]. For instance, it has been shown that galaxies closer to the spine of lament are more likely to be red and massive while it gets bluer and lighter when the radial distance is larger [e.g. Bonjean et al., 2020]. Some studies also draw a correlation between the orientation of galaxies and the direction of the lament they are hosted in [Ganeshaiah Veena et al., 2018, 2019; Kraljic et al., 2020].
The insightful analysis of the cosmic web in large-scale simulations carried out by Cautun et al. [2014] teaches us that the lamentary structure contains half of the dark matter mass of the Universe at the present time for only few percents of the total volume. By also hosting halos of various masses, typically from 1010 M/h to roughly 1013 M/h, laments are the ideal place to study collapsed objects like galaxies. Numerous works, based on simulations or observations, additionally show that a considerable fraction of baryons are hidden in the form of hot gas in laments [Cen & Ostriker, 2006; Martizzi et al., 2019; Tanimura et al., 2020a; Galárraga-Espinosa et al., 2021] hence emphasising the crucial role this particular environment is playing in baryonic processes shaping the formation and evolution of galaxies. These ndings highlight the importance of detecting the lamentary pattern both to improve the quality of the predictions in simulations and to discover new correlations with hosted tracers.
Galaxy clusters are massive biased tracers of the underlying matter observed both in simulations and surveys. It is now well-established that studying their properties like shapes, masses and redshift is a wealthy source of information on how they structure and evolve with time and on the underlying cosmological model [Yoshida et al., 2000; Peter et al., 2013; Sereno et al., 2018]. These properties have also been shown to be inuenced by the local environment of halos and clusters and how they are locally embedded in the cosmic web [Poudel et al., 2017; Darragh Ford et al., 2019; Gouin et al., 2020]. In particular, Musso et al. [2018] expect that lowmass halos are more likely lying inside laments while massive halos are found to be closer to nodes. cosmic web anisotropies are hence indicators of halo assembly bias and therefore strongly correlated with halo properties [Paranjape et al., 2018a; Ramakrishnan et al., 2019].
At a topological level, the number of lament a node, or massive cluster, is connected to, a quantity called the connectivity, is expected to depend on the growth factor hence allowing to put constraints on dark energy [Gay et al., 2012; Codis et al., 2018]. These relations between nodes and their local environments of the cosmic web will be investigated in more details in Chapter 5, Sect. 5.5.
Table of contents :
I Emergence of large-scale structures
1 Structure formation in the Universe
1.1 The homogeneous universe
1.1.1 Distances in an expanding universe
1.1.2 The dynamics of the homogeneous Universe
1.2 The birth of large-scale structures
1.2.1 Linear perturbation theory
1.2.2 Zel’dovich formalism
1.3 Statistical descriptions of the matter distribution
1.3.1 Discrete random elds
1.3.2 Correlation functions and poly-spectra
1.4 The CDM model
1.4.1 Presentation of the model
1.4.2 Cosmological parameters and matter power spectrum
2 Large-scale structures manifestation
2.1 Large-scale structures in simulations
2.1.1 First exhibitions
2.1.2 Dark matter only and hydrodynamical simulations
2.2 The cosmic web through galaxies
2.2.1 Galaxy surveys
2.2.2 Observational eects
2.2.3 Galaxy bias
2.3 Motivations for cosmic web classication
2.3.1 The limitations of statistical analyses
2.3.2 The cosmological sensitivity of environments
2.3.3 The role of the environment in shaping galaxies and clusters
2.4 Challenges in detecting cosmic laments
2.4.1 Structural complexity of the pattern
2.4.2 Non-unicity of the denition
2.5 Conclusions and perspectives for the thesis
II Statistical methods for pattern extraction
3 Statistical physics for clustering
3.1 Context and motivation
3.1.1 Machine learning and physics
3.1.2 Optimisation problems and regularisation
3.1.3 Clustering and its drawbacks
3.2 Mixture models
3.2.1 General formalism
3.2.2 The Gaussian case
3.3 Expectation-Maximisation algorithm
3.3.1 Introduction through Mixture Models
3.3.2 Iterative scheme
3.3.3 The particular case of Gaussian mixtures
3.4 Phase transitions in Gaussian mixtures
3.4.1 Statistical physics formulation of clustering
3.4.2 From paramagnetic to condensation phase
3.4.3 Hard annealing
3.4.4 Soft annealing
3.4.5 Graph-regularised mixture model
3.5 Summary and prospects
4 Principal graph learning
4.1 Context
4.1.1 Spatially structured point-cloud data
4.1.2 Principal curves
4.2 Elements of graph theory
4.2.1 Introduction and denitions
4.2.2 Linear algebra representations
4.2.3 Some graph constructions
4.3 Graph regularised mixture models
4.3.1 Full model and formalism
4.3.2 Algorithm and illustrative results
4.4 About graph priors
4.4.1 Basic graph constructions
4.4.2 The average graph prior
4.5 Convergence and time complexity
4.5.1 Convergence analysis
4.5.2 Time complexity
4.5.3 Runtimes
4.6 Hyper-parameters and initialisation
4.6.1 The impact of parameters
4.6.2 Initialisation
4.7 Illustrative application: Road network
4.8 Summary and prospects
III Analysis of the CosmicWeb pattern
5 The principal graph of the CosmicWeb
5.1 Context and motivations
5.2 Filamentary pattern detection
5.2.1 T-ReX: Tree-based Ridge eXtractor
5.2.2 Filamentary pattern extraction from Illustris subhalos
5.2.3 Performance evaluation
5.3 Identication of individual laments
5.3.1 A graph-based denition for laments
5.3.2 Characteristics of individual laments
5.3.3 Association of galaxies
5.4 Filaments characteristics in simulations
5.4.1 Simulations and principal graphs
5.4.2 Comparison of laments characteristics
5.5 The impact of the cosmic web on cluster properties in simulations
5.5.1 Data, lamentary pattern and connectivity
5.5.2 Impact of connectivity on the growth and shapes of clusters
5.5.3 Impact of cluster dynamical states on the connectivity
5.5.4 The inuence of mass growth history
5.6 Summary and perspectives
6 Constraining cosmological parameters with cosmic environments
6.1 Context and introduction
6.1.1 The matter power spectrum as a cosmological probe
6.1.2 The cosmic environments as an alternative probe
6.2 Data & Methodology
6.2.1 The Quijote suite of simulations
6.2.2 Cosmic web segmentation
6.3 Environments sensitivity to cosmology
6.3.1 Cosmic fractions as a function of cosmological parameters
6.3.2 Power spectra in cosmic environments
6.4 Constraining power of cosmic environments
6.4.1 Fisher formalism for information content quantication
6.4.2 Real-space auto-spectra
6.4.3 Redshift-space auto-spectra
6.4.4 Stability and convergence analysis
6.5 Conclusion and perspectives