Get Complete Project Material File(s) Now! »

## The birth of graph theory

The birth of the graph theory is very illuminating from the perspective of the importance of abstract models to improve our knowledge on apparently complicated systems.

In the XVIII century, the city of Königsberg was composed by four landmasses connected by seven bridges as can be seen in the left panel of gure 2.1. A curious question intriguing the inhabitants was whether it was possible to nd a path through the city so to use each bridge exactly once. Leonard Euler understood that in order to solve this problem, the system was exactly describable just in terms of the landmasses and of the information telling which of them were connected by a bridge (see right panel of gure 2.1). All the other details such as the explicit position of the bridges, their length, the dimension and the location of the landmasses etcetera were completely irrelevant. The solution was indeed found in terms of such synthetic description, as Euler showed [47] how the path with the desired characteristics was impossible in Königsberg and, more generally, in any city in which an odd number of bridges converged on more than two nodes.

It is worthy to stress that in this case, and in general in this kind of combinatorial problems, the description obtained via the graph is exact. In other cases this is no more possible, but the more or less rened approximations one obtains by abstractly studying the geometry of the problem are nevertheless useful for better understanding the situation under exam. In some situations (for instance in biological systems) the simple collection of nodes and links is a very rude description; even if in such cases it is typically hard to produce quantitatively accurate estimates of the outcomes of the system, the use of the network metaphor [48] allows one to have an idea about the kind of processes and phenomena that will possibly take place on it. More than the possibility of modelling a system through a network, however, one should ask himself whether this modelling really enriches our knowledge on the system and, in particular, which questions it will help us answering. Applying a highly simplifying network metaphor to a real system is something to do with caution and having clear in mind what features of this simplication one is going to look at.

### The main features of a graph

An elegant way of abstractly dening networked structures is by using graph theory which species relationships among a collection of items. Communication networks (for instance set of routes, interconnections among computers or infrastructures for transporting energy), social networks, information networks (as web pages connected by hyperlinks) are all examples of systems describable in this way. One of the rst characterisation of such systems one can think of is based on the reachability of its vertices, meaning by this the possibility of going from a node to another by travelling over the network links; if any node is reachable from any other, the graph is said to be connected. More quantitatively, one can look at the minimum number of steps needed to connect any couple of nodes on the graph. The mean of these values goes under the name of average shortest path length, and it is nothing but the distance that typically separates two vertices; the maximum among all the distances is instead usually referred to as the diameter of the network. One of the most important local features of a graph is probably the degree k of each of its nodes, dened as the number of links converging on it. The average of all these values gives again a very useful global characterisation of the graph, as it is equivalent, after a rescaling by the system size N, to the density of connections among the network elements. Two extreme cases in this respect are of particular interest. Firstly, one can be interested in graphs featuring very few connections, and in particular with the lowest possible number of links. If one restricts to connected graphs, the smallest possible number of edges is N 1 and corresponds to the graph being a tree. In this case, no multiple paths exist between any couple of nodes and hence no closed loops can be drawn by following the edges. As will be discussed in the following, on trees the exact solution of many problems can be found by making use of iterative methods. Approximated versions of these methods can be nevertheless used also on graphs that, even without being strictly speaking trees, show very rare loops so that any node sees, around it, a structure that is very similar to a tree; they are called locally tree-like networks and constitute a class much broader than the one comprehending only trees. They can indeed be obtained by randomly choosing a small enough number of connections among the nodes, whereas the probability of getting a tree with such a procedure is negligible.

#### Enriching the network metaphor

The description of a networked system as a set of interconnected elements has to be seen as a basic framework that can be enriched when dealing with systems with specic characteristics. Even without entering into the huge variety of such details that can be inserted in a network denition, a couple of features are worthy to be recalled. Depending on the interaction one wants to model being symmetrical or not, one can model the system via respectively an undirected or a directed graph. In this latter case, the relationship existing between a couple of nodes is specied by an arrow going from one of the two to the other; if the graph is undirected, instead, they are simply joined by an edge. Put another way, in an undirected graph the connections between nodes can be listed according to unordered pairs of vertices, whereas in directed graphs they are specied by ordered pairs such that (a; b) and (b; a) are not equivalent. The network of scientic collaborations is for instance symmetrical (i.e. undirected) whereas the links telling whom eats whom in a food web necessarily need to be considered as directed. In biological systems, the gene regulation networks are typically asymmetrical as the fact of gene a regulating gene b does not imply the other way round. When inferring the protein structure, on the other hand, the couplings are by denition symmetrical as the link is considered as a proxy of the spatial proximity of two sites, this latter being a symmetrical quantity. If the graph is directed, the metrics by which a network is characterised have to be appropriately modied, for instance by dening an in-degree and a out-degree for any node telling the number of links coming into and going out of a vertex, instead of a single quantity representing the number of neighbours it has.

Another hypothesis one can be interested in relaxing is the equivalence among the role played by any of the edges. Thinking about the brain, it has been experimentally shown how two neurons can be connected more or less strongly; the same holds for social systems, in which some of the people we are linked to are great friends of ours, whereas others are just acquaintances. This feature is modelled by assigning a weight to each of the links in the graph that can even be, in general, a negative quantity; this is for instance the case if one wants to take care of inhibitory relationships occurring in neural networks.

**The comparison between observed and reproduced features**

From a modelling perspective, very interesting is the situation in which the real data are found to be dramatically dierent from the ones one obtains by generating a graph according to a simple model. For instance, in absence of strong a priori elements against this hypothesis, one can suppose that the degrees of two neighbouring nodes are not correlated. If this was the case, however, in a heterogeneous network the probability of a very highly connected node (also called a hub in the following) being connected to another one would be almost negligible; there are indeed largely more low-degree than high-degree nodes to which it could link. Looking for instance at the Internet graph, nevertheless, one discover what has been called a rich-club phenomenon, as the hubs are very well connected among each other [50]. This contradiction is enlightening as it shows that the former independence hypothesis has to be relaxed: in order for the description to be reliable, one has to allow a high-degree node to preferentially attach to another one of the same kind. If this is true, the network is said to be assortative. Also disassortative graphs where a high-degree node preferentially links to low-degree ones and vice versa are found in applications. This is the case for instance of the network composed by the personal computers and by the machines working as servers; the disassortativity can be seen as a soft version of a bipartite graph where the nodes are divided into two groups and all the links connect a node of one type to one of the other.

**Table of contents :**

**1 Models and frameworks **

1.1 The role of models

1.2 Forward problems, inverse problems

1.2.1 Same phenomena, dierent questions

1.2.2 A more precise denition

1.3 A physics modelling framework: the Ising model

1.3.1 Motivation

1.3.2 The Ising model

1.3.3 Universality of critical phenomena

1.3.4 The generalisations

1.3.5 Inverse problems in disordered systems

1.4 A computer science modelling framework: the constraint satisfaction problems

1.4.1 Denitions and examples

1.4.2 Infeasibility of the straightforward solutions

1.4.3 The worst case scenario

1.4.4 The typical case scenario

1.5 Spin glasses, rCSP and large deviations

1.5.1 Spin glasses and CSP: two languages, same problems

1.5.2 Looking for atypical events

1.5.3 The large deviations in a simple case

1.6 Statistical mechanics for non-physics problems

1.6.1 A common framework, many elds of interest

1.6.2 Social systems

1.6.3 Economics and nance

1.6.4 Neural networks

1.6.5 Other biological systems

1.6.6 Signal processing

1.6.7 Computer science

**2 The network theory **

2.1 Motivations and denitions

2.1.1 Ubiquity of networks

2.1.2 The birth of graph theory

2.1.3 The main features of a graph

2.1.4 Enriching the network metaphor

2.2 A path towards more realistic networks

2.2.1 The comparison between observed and reproduced features .

2.2.2 Reproducing the observed average distance and clustering coe cient

2.2.3 Reproducing the observed degree distribution

2.3 Studying graphs as a branch of probability theory

2.3.1 A mathematical description

2.3.2 Combinatorial problems on graphs

2.4 Phase transitions, critical phenomena and statistical physics approach

2.4.1 Dynamics on networks, dynamics of networks

2.4.2 Statistical physics and network theory

2.4.3 Critical phenomena on networks

2.5 Examples of other problems

2.5.1 The detection of communities

2.5.2 Networked versions of other problems

**3 Contagion dynamics on graphs **

3.1 General framework

3.1.1 The problem and its range of applicability

3.1.2 Epidemic processes and compartmental models

3.1.3 The choice of the dynamics

3.1.4 The characteristics of the graph

3.1.5 One spreading phenomenon, many possible questions

3.1.6 Dierent approaches

3.2 Minimal contagious sets in random regular graphs

3.2.1 Denition of the problem

3.2.2 The energy function

3.2.3 Mapping to other standard problems in graph theory

3.3 The cavity method treatment

3.3.1 The replica symmetric RS formalism

3.3.2 The breaking down of the RS assumption

3.3.3 1RSB and energetic 1RSB formalism

3.3.4 The energetic 1RSB formalism

3.4 Main results

3.4.1 The solutions of the problem

3.4.2 Analytical results

3.4.3 Numerical results

3.5 Future perspectives

**4 Exploring networks **

4.1 Graph exploration and random walks

4.1.1 Exploring a graph, a very general problem

4.1.2 Interest and applications in dierent elds

4.1.3 Random walks on graphs

4.1.4 Joining the two: graph exploration through random walks

4.2 Rare event statistics on random walks on networks

4.2.1 The problem

4.2.2 The model

4.2.3 The large deviations

4.3 Main results

4.3.1 A degree-based approximation

4.3.2 Localisation transition

4.3.3 Mode-switching transition

4.4 Summary and future perspectives

4.4.1 Summary

4.4.2 Better understanding of the critical behaviour

4.4.3 Dierent functions

4.4.4 Functions taking value on the links

4.4.5 Dierent topologies

4.4.6 Improvement of clustering techniques

**5 Inferring the graph **

5.1 Motivation, applications and connection to other problems

5.1.1 Motivation

5.1.2 The matrix completion problem

5.1.3 The collaborative ltering

5.2 The inference of a network of interactions from a partial knowledge of the correlations

5.2.1 Denition of the problem

5.2.2 Motivation, applications and previous work

5.3 The model

5.3.1 Statistical context of the project

5.3.2 Prior, posterior and optimal estimate of the model

5.4 Choosing the couplings and evaluating the performances

5.4.1 Articial models

5.4.2 The geometry of J

5.4.3 Generating synthetic J

5.4.4 The observables

5.5 Main analytical results

5.5.1 Reasons justifying the study of the Hessian

5.5.2 Derivation of the Hessian

5.5.3 Derivation of the Hessian in the eigenbasis of C

5.5.4 Perturbative approximation of the Hessian

5.6 The heuristics

5.6.1 Choices based on inference

5.6.2 Heuristics making use of the Hessian

5.6.3 Intractability of the complete Hessian

5.6.4 Diagonal elements of the Hessian

5.6.5 Validity of the perturbative approximation

5.7 Initial knowledge and eects of measurements

5.7.1 Choice of the initial condition

5.7.2 Update of C after a measurement

5.8 Results

5.8.1 Dependency on the initial condition

5.8.2 The exploration-exploitation trade-o

5.8.3 Dierent heuristics on a xed geometry

5.8.4 Dierent graphs, dierent features

5.9 Future perspectives

5.9.1 Biological applications

5.9.2 Theoretical challenges

5.9.3 More complicated heuristics

5.10 Appendices

5.10.1 Caveat on the procedure for creating synthetic J

5.10.2 The value of the couplings

5.10.3 Regularisation and noise

5.10.4 Uniqueness of the minimum

5.10.5 Completion of the initial condition

**6 Conclusions and perspectives **

6.1 The network theory in the big data era

6.1.1 The end of theory?

6.1.2 An example of rich data analysis: Quora

6.1.3 New solutions to old questions

6.1.4 The future of network theory

6.2 More data, same problems

6.2.1 Optimisation of spreading on networks

6.2.2 Extreme events in network exploration

6.2.3 Inferring networks with partial information

**References **

A Minimal contagious sets in random regular graphs

B Rare events statistics of random walks on networks: localization and other dynamical phase transitions