Get Complete Project Material File(s) Now! »

## An indexing scheme for efficient distance querying: EUQLID

To deal with the problem of efficiently answering distance queries, we proposed EUQLID as an efficient indexing scheme. We define a variant of the 2-hop cover where we enforce an additional constraint to limit the size of the index while allowing partial coverage of distance information. The missing distance information can be retrieved efficiently at query time by means of a fast variant of the Dijkstra’s algorithm whose search space is carefully pruned. Our goal during the indexing phase is then to cover as much distance information as possible without violating the size constraint. We show that this variant of the 2-hop cover admits a 0.63−approximation algorithm which inspired our efficient indexing algorithm.

In order to select which distance information should be stored in our index, we exploit an interesting property that we have discovered after performing some experiments on some real graphs, and, that real-world networks often exhibit. This property says that a few nodes connect all nodes through short paths (see Section 3.4). These nodes are usually the nodes with larges degrees in the network. This property can be observed empirically in our daily routine, as short roads connecting any two cities often go through large cities, long distance flights connect hubs in the corresponding countries, and researchers are connected through prominent scientists in their field in the corresponding co-citation networks. Our idea is to carefully select a set of large degree nodes and to store all distance information corresponding to paths traversing those nodes. As paths traversing large degree nodes are not necessarily shortest paths one needs to employ Dijkstra’s algorithm to find exact distances at query time. However, the search space of Dijkstra’s algorithm can be effectively pruned by avoiding traversing large degree nodes (as the corresponding distance information is covered by the index) and using the length of the paths traversing large degree nodes as maximum depth in the Dijkstra’s algorithm.

We summarize our contributions on answering distance queries as follows:

• We define a new variant of the 2-hop cover where we enforce an additional constraint that limits the size of the index while improving query response time;

• we develop a 0.63-approximation algorithm for this problem;

• our main contribution is EUQLID, which is an efficient algorithm for indexing and processing distance queries on very large graphs. Our algorithm is based on an efficient variant of the approximation algorithm discussed above. Our extensive experimental evaluation against state-of-the-art algorithms shows that our approach outperforms existing approaches and that distance queries can be processed within hundreds of milliseconds on very large real-world directed graphs.

This work was submitted and it is still under review.

### A reachability-based access control model for social networks

In order to deal with privacy issues in OSNs (described in Section 1.1.2), we propose a network-aware access control model that allows users to monitor the spread of their personal information. According to our model, users can explicitly specify fine-grained privacy preferences as they think about it in real life scenarios. Our model enables users, having a specific audience in mind, to associate to each piece of information a given audience, which is specified based on the relationship nature between the information owner and his contacts. Following this model, a given user can access an information if and only if a specific path between him and the owner exists. This path expresses constraints on relationship types (e.g., friend, colleague, etc.), edge direction, distance, trust, and, user attributes (e.g., live in Paris, age more than 20, etc.). Deciding whether access should be granted or not is done on the fly based on a the set of access policies assigned to the requested information. We studied the worst case running time of theaccess control protocol, and, did experiments to study its performance on real social graphs, which have shown that our model is practical. This access control model was presented in DBSocial 2011 and the PhD symposium joint to ICDT/EDBT 2012.

#### A privacy management system: Primates

We implemented a privacy management system for OSNs that we called Primates. This system is an enforcement of the proposed access control model. It allows users to selec some personal information on their profile and assign the desired privacy policies to them.

Privacy preference assignment is a user-friendly process which can be done via a graphical interface (i.e., using a user-friendly graphical tool). Primates also allows users to preview and visualize the authorized audience as a graph and change their policy accordingly, in case they realize that some unwanted users are part of the audience. Primates was demonstrated at CIKM 2012.

**Table of contents :**

Dedications

Acknowledgments

Contents

List of Algorithms

List of Figures

List of Tables

**1. Introduction **

1.1. Context and Problem Definition

1.1.1. Scaling reachability and distance queries on large graphs

1.1.2. Privacy management in social networks

1.2. Contributions

1.2.1. An indexing scheme for efficient distance querying: EUQLID

1.2.2. A reachability-based access control model for social networks

1.2.3. A privacy management system: Primates

1.3. Dissertation Organization

**2. Background and Preliminaries **

2.1. Introduction

2.2. Boolean Expressions

2.3. Basic Graph Definitions

2.4. Graph Traversal

2.4.1. Breadth-first search (BFS)

2.4.2. Depth-first search (DFS)

2.4.3. Dijkstra’s algorithm

2.4.4. Topological sort

2.5. Graph Database Systems

2.6. 2-hop Labeling

2.7. Set Cover

2.8. Max Cover

2.9. Submodular Function Maximization

2.10. disk-friendly Set Cover

2.11. Wilson Score-based Sampling

2.12. Conclusion

**3. Interesting Properties of Real Graphs **

3.1. Introduction

3.2. Complex Network Models: Random versus Scale-Free

3.3. Well-known Properties

3.3.1. Static properties

3.3.2. Dynamic properties

3.4. Analyzing Shortest Paths

3.5. Studying Stars

3.6. Conclusion

**4. Answering Distance Queries in Large Directed Graphs **

4.1. Introduction

4.2. RelatedWork

4.2.1. Simple reachability

4.2.2. Distance queries

4.2.3. Labeled queries

4.3. Preliminaries and Formal Settings

4.4. EUQLID

4.4.1. Indexing Algorithm

4.4.2. Query Processing

4.4.3. Further optimization

4.5. Experimental Evaluation

4.5.1. D-EUQLID: Disk-based version

4.5.2. Settings

4.5.3. Methodology

4.5.4. Datasets

4.5.5. Results

4.5.6. Effect on varying k

4.6. Conclusion

**5. Access Control in Social Networks **

5.1. Introduction

5.2. RelatedWork

5.3. Preliminaries

5.4. Access Control Requirements

5.5. The Access Control Model

5.5.1. Access control policy

5.5.2. Access control enforcement

5.6. Complexity Analysis

5.7. Experiments

5.8. Primates: A Privacy Management System for OSNs

5.8.1. Global architecture

5.8.2. Features

5.9. Conclusion

**6. Conclusions and Research perspectives **

6.1. Put It All Together

6.2. Research Perspectives

**A. Résumé en français **

A.1. Contexte et Problématiques

A.1.1. Passage à l’échelle des requêtes d’accessibilité et de calcul de distance dans les grands graphes

A.1.2. Sécurité dans les réseaux sociaux

A.2. Contributions

A.2.1. EUQLID : un schéma d’indexation efficace pour le calcul des distances dans les graphes

A.2.2. Un modèle de contrôle d’accès dans les réseau sociaux basé sur l’accessibilité

A.2.3. Primates : un système de gestion de la confidentialité dans les réseaux sociaux

A.3. Organisation du Manuscrit

A.4. Conclusion Globale

A.4.1. Synthèse

A.4.2. Perspectives

Lexique anglais-français

**Self References**