Several formalization of the notion of protection have been proposed in the literature. Among those based on probability theory, we mention the true-or-false properties in the Anonymity hierarchy: like strong anonymity which describes an ideal situation where a protocol does not leak any information about the identity of the user, and some weaker notions like conditional anonymity [Cha88, HO05, BP05] and probable innocence [RR98]. More refined approaches, based on information theory, aim at measuring also the degree of protection provided by a system. The idea is to express the leakage of information in terms of the notion of mutual information. A nice feature of this approach is that we can consider diﬀerent notions of entropy depending on the kind of adversary we want to model. The most used are the Shannon entropy, see for example [CHM05, Mal07], and the R´enyi min-entropy [Smi09].
Diﬀerential privacy [Dwo06, DL09, Dwo11] is a promising definition of confidentiality that has emerged recently from the field of statistical databases. It provides strong privacy guarantees, and requires fewer as-sumptions than the information-theoretical approach. We say that a system is ǫ-diﬀerentially private if for every pair of adjacent datasets (i.e. datasets which diﬀer in the data of an individual only), the probabilities of obtaining a certain answer diﬀer at most by a factor eǫ. Diﬀerential privacy captures the intuitive requirement that the (public) answer to a query should not be aﬀected too much by the (private) data of each singular individual.
Although diﬀerential privacy originates from the field of statistical databases, it is becoming increasingly popular in many other fields, ranging from programming languages [RP10] to social networks [NS09] and geolocation [MKA+08]. One of the reasons of its success is its independence from side knowledge, which makes it robust to attacks based on combining various sources of information.
The extension of the principle of diﬀerential privacy to secrets with a generic adjacency relation has been universally adopted in the literature, see for instance [BKOB12, GHH+13, CABP13]. Therein, the sensitive in-formation to be protected was other than the value of a single individual in databases. A general notion of adjacency was considered, formalized by a general metric that measures the distance between values of secrets. They showed that within this setting, it is still reasonable to employ the same principle of diﬀerential privacy and to obtain a meaningful notion of privacy. In this thesis we also consider the principle of diﬀerential privacy under a general notion of adjacency, which provides the basis for formalizing also other security concepts, like anonymity.
This thesis: Diﬀerential Privacy in Concurrent Systems
The goal of this thesis is to develop formalisms for probabilistic concurrent systems that can reason about diﬀerentially private behaviors. We address this issue mainly from three directions: modular reasoning provided by pro-cess combinators, distance measuring based on approximate bisimulations and axiomatic theories.
In Chapter 3 we consider a probabilistic process calculus equipped with secret labels as a specification formalism for concurrent systems, and we propose a framework for reasoning about the degree of diﬀerential privacy provided by such systems. In particular, we investigate the preservation of the degree of privacy under composition via the various operators. We illustrate our idea using a variant of the anonymity protocol: Crowds. Our extension allows anonymous users to send messages probabilistically over the users they think trustable, rather than over all users as strictly required in standard Crowds. Furthermore, we show that this trust information may compromise privacy, and we introduce a notion of adjacency relation to eliminate this factor, thus retrieving the real level of diﬀerential privacy in this case. Then, we investigate how the users’ preference levels aﬀect the degree of privacy.
Bisimulations for diﬀerential privacy
In Chapter 4 we investigate techniques for proving diﬀerential privacy based on approximate bisimulations. Our motivation stems from the work of Tschantz et al. [TKD11], who proposed a verification method based on proving the existence of a stratified family between states, that can track the privacy leakage, ensuring that it does not exceed a given leakage bud-get. We improve this technique by investigating a state property which is more permissive and still implies diﬀerential privacy. We propose a new probabilistic bisimulation by integrating the notion of amortisation, which results into a more parsimonious use of the privacy budget. We show that the closeness of automata in our amortised bisimulation still guarantees the preservation of diﬀerential privacy, which makes it suitable for verification. Moreover we show that our amortised bisimulation is substitutive under typical process combinators. We apply the bisimulation verification frame-work to reason about the degree of diﬀerential privacy of protocols by the example of the Dining Cryptographers Protocol with biased coins.
Complete proof systems
The concept of amortisation was initially introduced for cost-based bisimu-lations [KAK05, dFERVGR07] to make long-term behavioral comparisons between nondeterministic systems. The idea of amortisation has been bor-rowed in the previous chapter to formulate an approximate notion for prob-abilistically behavioral equivalence: amortised probabilistic bisimulation. In Chapter 5 we present sound and complete proof systems for amortised strong probabilistic bisimulation and its weak counterpart, and prove their soundness and completeness. Our results make it possible to reason about long-term (observable) probabilistic behaviors by syntactic manipulation.
Generalized bisimulation metrics
The pseudometric based on the Kantorovich lifting is one of the most pop-ular notions of distance between probabilistic processes proposed in the literature. However, its application in verification is limited to linear prop-erties. In Chapter 6 we propose a generalization which allows to deal with a wide class of properties, such as those used in security and privacy. More precisely, we propose a family of pseudometrics, parameterized on a notion of distance which depends on the property we want to verify. Furthermore, we show that the members of this family still characterize bisimilarity in terms of their kernel, and provide a bound on the corresponding distance between trace distributions. Then we study the instance corresponding to diﬀerential privacy, and we show that it has a dual form, easier to compute. We also prove that the typical process-algebra constructs are non-expansive, thus paving the way to a modular approach to verification.
We present some related work concerning verification of diﬀerential privacy in various contexts, and formalisms of other notions of information protec-tion based on process calculi. Detailed comparison between our work of each part and works in the literature can be found sprinkled in each chap-ter. To the best of our knowledge, the line of work in this thesis is the first to investigate diﬀerential privacy for concurrent systems within the setting of process calculi.
Verification of diﬀerential privacy has been itself an active area of re-search. It has been investigated in a SQL-like language [McS09] for statisti-cal databases and a MapReduce-based system for cloud computing [RSK+10]. Prominent approaches based on formal methods are those based on type systems [RP10, GHH+13] and logical formulations [BKOB12, BDG+13].
An earlier paper [TKD11] defined stratified bisimulation relations suitable for proving diﬀerential privacy, however it suﬀered from the fact that the respective kernel relations do not fully characterize probabilistic bisimilar-ity.
Among several formalizations of the notion of information protection based on probability theory, we mention some rather popular approaches, mainly based on information theory, in particular, to consider diﬀerent no-tions of entropy depending on the kind of adversary, and to express the leakage of information in terms of the notion of mutual information. We name a few works also discussed in the models of probabilistic automata and process algebra: Boreale [Bor06] establishs a framework for quantify-ing information leakage using absolute leakage, and introduces a notion of rate of leakage. Deng et al. [DPW06] use the notion of relative entropy to measure the degree of anonymity. Compositional methods based on Bayes risk method are discussed by Braun et al. [BCP08, CPB]. A metric for probabilistic processes based on the Jensen-Shannon divergence is proposed in [Mu09] for measuring information flow in reactive processes. Unlike the information-theoretical approach, diﬀerential privacy provides strong pri-vacy guarantees independently from side knowledge. However, progress for studying diﬀerential privacy in the models of probabilistic automata and process algebra has been relatively new. It would be interesting to see how the issues stressed and the reasoning techniques developed there can be adapted for diﬀerential privacy.
Plan of the Thesis and Contributions
This thesis provides the theoretical basis for developing algorithms and tools of verifying or computing diﬀerential privacy in probabilistic processes. The most significant contributions of this thesis can be summarized as follows:
1. We prove that non-deterministic choice, probabilistic choice, the re-striction operator and a restricted form of parallel composition pre-serves diﬀerential privacy, in the sense that combining components
under these operators does not compromise the privacy of the system (in Chapter 3).
2. We propose an approximate notion for probabilistically behavioral equivalence: amortised probabilistic bisimulation. We show that this bisimulation is suitable for verifying diﬀerential privacy and it im-proves the method based on a stratified family between states in the work of Tschantz et al.(in Chapter 4).
Furthermore, we build sound and complete proof systems for amor-tised probabilistic bisimulation and its weak counterpart, and prove their soundness and completeness, which makes it possible to reason about diﬀerentially private behaviors by syntactic manipulation. (in Chapter 5).
3. We propose a family of generalized bisimulation metrics based on the Kantorovich lifting. The kernel of this family fully character-ized bisimilarity. The metrical closeness between processes indicates a bound on the distance between the corresponding trace distribu-tions. This allows to deal with a wide class of behavioral properties; importantly, one of its instances corresponds to diﬀerential privacy (in Chapter 6).
Besides these four chapters, there are two introductory chapters, the first being the present introduction. Chapter 2 introduces some preliminary in-formation about probability spaces, probabilistic automata, a probabilistic version of CCS and diﬀerential privacy. Finally, in Chapter 7 we make our final conclusions.
Most of the results in this thesis have already appeared in scientific publi-cations. More specifically:
• Chapter 3 is based on the paper Modular Reasoning about Dif-ferential Privacy in a Probabilistic Process Calculus [Xu12] that appeared in the proceedings of the 7th International Sympo-sium on Trustworthy Global Computing (TGC 2012), and the comple-mentary technical report Privacy-Preserving Process Construc-tors [XHP14].
• Chapter 4 is based on the paper Metrics for Diﬀerential Privacy in Concurrent Systems [XCL14], which was published in the 34th IFIP WG 6.1 International Conference on Formal Techniques for Dis-tributed Objects, Components and Systems (FORTE 2014).
• Chapter 5 is based on the paper Complete Proof Systems for Amortised Probabilistic Bisimulations [XL14], which is about to appear on Journal of Computer Science and Technology.
• Chapter 6 is based on the paper Generalized Bisimulation Metrics [CGPX14] that appeared in the proceedings of the 25th Interna-tional Conference on Concurrency Theory (CONCUR 2014).
Table of contents :
1.1 Concurrent and Probabilistic Processes
1.2 Differential Privacy
1.3 This thesis: Differential Privacy in Concurrent Systems
1.3.1 Modular reasoning
1.3.2 Bisimulations for differential privacy
1.3.3 Complete proof systems
1.3.4 Generalized bisimulation metrics
1.4 Plan of the Thesis and Contributions
2.1 Probability Spaces
2.2 Probabilistic Automata
2.3 Probabilistic Process Algebra
2.4 Probabilistic bisimilarity
2.6 Differential Privacy
3 Modular Reasoning in a Probabilistic Process Calculus
3.1.1 CCSp with secret labels
3.1.2 Process terms as channels
3.1.3 Differential Privacy in CCSp with secret labels
3.1.4 The Crowds protocol
3.1.5 Relation between differential privacy and anonymity
3.2 Modular Reasoning
3.3 Trust and Legitimacy in Crowds
3.3.2 The CCSp code for the extended Crowds protocol
3.3.3 An anonymity-preservation property
3.4 Degradation of privacy by trust
3.4.1 An adjacency relation based on trust
3.4.2 False negatives in Theorem 3.3.2
3.5 Users’ preference levels in Crowds
3.6 Related work
4 Bisimulations for Differential Privacy
4.1.1 Admissible scheduler
4.1.2 Differential privacy under admissible scheduler
4.2 The accumulative bisimulation
4.3 The amortised bisimulation
4.4 Comparing the two bisimulations
4.4.1 Relations with conventional probabilistic bisimilarity
4.6 An application to the Dining Cryptographers Protocol
5 Complete Proof Systems for Amortised Probabilistic Bisimulations
5.1 A simple probabilistic process algebra
5.2 Amortised probabilistic bisimulation
5.2.1 Basic properties
5.3 Weak amortised probabilistic bisimulation
5.3.1 Basic properties of 4
5.3.2 Amortised observational congruence
5.4 Proof system A1 for amortised bisimulation
5.5 Proof system A2 for amortised observational congruence
6 Generalized Bisimulation Metrics
6.2 A general family of Kantorovich liftings
6.3 A general family of bisimilarity pseudometrics
6.3.1 Bisimilarity as 0-distance
6.3.2 Relation with trace distributions
6.4 The multiplicative variant
6.4.1 Transformations of the linear-fractional program
6.4.2 Application to differential privacy