Privacy protection mechanisms design via machine learning

Get Complete Project Material File(s) Now! »

Quantitative information flow notions: g-leakage and g-vulnerability

Let X be a set of secrets and Y a set of observations. The adversary’s initial knowledge about the secrets is modeled by a prior distribution PpX q (namely PX , and often referred to as ⇡). A system is modeled as a probabilistic channel from X to Y, described by a stochastic matrix C, whose elements Cxy give the probability to observe y P Y when the input is x P X (namely PY |X ). Running C with input ⇡ induces a joint distribution on X ˆ Y denoted by ⇡õC. In the g-leakage framework [ACPS12] an adversary is described by a set W of guesses (or actions) that it can make about the secret, and by a gain function gpw, xq expressing the gain of selecting the guess w when the real secret is x. The prior g-vulnerability is the expected gain of an optimal guess, given a prior distribution on secrets: In the posterior case, the adversary observes the output of the system which allows to im-prove its guess. Its expected gain is given by the posterior g-vulnerability, according to.

Information theory notions

We are going to we briefly cover some basic notions from the fields of information and probability theory. i.e. entropy and mutual information. We refer to [CT91] for further details. Let X, Y be discrete random variables with respectively n and m possible values: X “ tx1, x2, . . . , xnu and Y “ ty1, y2, . . . , ymu. Let p X p¨q and pY p¨q indicate the probability dis-tribution associated to X and Y respectively, and let pY,X p¨, ¨q and pY |X p¨|¨q indicate the joint and the conditional probability distributions, respectively. Namely, p Y,X px, yq represents the probability that X “ x and Y “ y, while pY |X py|xq represents the probability that Y “ y given that X “ x. For simplicity, when clear from the context, we will omit the subscript, and write for instance ppxq instead of pX pxq. Conditional and joint probabilities are related by the chain rule ppx, yq “ ppxq ppy|xq,

Brief review of machine learning notions

General notions of learning theory
We give here a brief introduction about the learning process and the derivation of the model, and we will focus on the supervised learning scenario in the context of classification prob-lems. We describe the basic elements common to all learning algorithms, and to this purpose we introduce a generic learner model based on a well established statistic framework.
A learning problem is defined by:
• a domain X of objects, represented as a vector of features (aka attributes) that we would like to classify;
• a set of labels (aka classes) Y;
• a set of training data, i.e., a sequence S “ pp~x1; y1q . . . p~xm; ymqq of pairs in X ˆ Y;
• a correct labelling function f : X Ñ Y, such that, for all i, yi “ fp~xiq;
• a distribution DX , according to which the samples are generated;
• the prediction rule or hypothesis h : X Ñ Y, that can be used to predict the label of new domain points;
• a measure of success that quantifies the predictor’s error.
Ideally, the goal of the learning process is to select an h that minimizes the risk, defined as: def (2.14) LD,f phq “ P~x„D rhp~xq ‰ fp~xqs , which represents the probability (P) of a mismatch between h and f, measured with respect to the distribution D. The quantity in eq. (2.14) is the error of a specific prediction rule h, or as referred at in many context, it is its generalization error. This error can be computed on the training set (training error) or on a test set of samples drawn from the same distribution as the training set but never seen during the training phase (test error). If the error on the training data is large, then this usually means that the model is “underfitting” the data distribution. However, a negligible training error does not necessarily mean that our model is good. Indeed, if the error on the test set is large, it is likely that the trained model overfits the training data.
In practice, however, we cannot compute analytically the h that minimizes (2.14), be-cause we do not have a mathematical description of D. What we can do, instead, is to use the training set S, that, being generated from D, represents an approximation of it. Then h is selected so to minimize the empirical risk over m samples, which is defined as:

Game theoretic problem description

Inspired by the GAN paradigm [GPAM` 14], we propose a system consisting of two ad-versarial neural networks, G (generator) and C (classifier). The idea is that G generates noise so to confuse the adversary as much as possible, within the boundaries of the utility constraints, while C inputs the noisy locations produced by G and tries to re-identify (clas-sify) the corresponding user. While fighting against C , G refines its strategy, until a point where it cannot improve any longer. Note that a significant difference from the standard GAN is that, in the latter, the generator has to learn to reproduce an existing distribution from samples. In our case, instead, the generator has to “invent” a distribution from scratch. The interplay between G and C can be seen as an instance of a zero-sum Stackelberg game [STT17], where G is the leader, and C is the follower, and the payoff function f is the privacy loss. Finding the optimal point of equilibrium between G and C corresponds to solving a minimax problem on f with G being the minimizer and C the maximizer. A major challenge in our setting is represented by the choice of f. A first idea would be to measure it in terms of C ’s capability to re-associate a location to the right user. Hence we could define f as the expected success probability of C ’s classification. Such function f would be convex/concave with respect to the strategies of G and C respectively, so from game theory we would derive the existence of a saddle point corre-sponding to the optimal obfuscation-re-identification pair. The problem, however, is that it is difficult to reach the saddle point via the typical alternation between the two nets. Let us clarify this point by providing the following simple example3: Example 1: Consider two users, Alice and Bob, in locations a and b respectively. Assume that at first G reports their true locations (no noise). Then C learns that a corresponds to Alice and b to Bob. At the next round, G will figure that to maximize the misclassification error (given the prediction of C ) it should swap the locations, i.e., report a for Alice and b for Bob. Then, on its turn, C will have to “unlearn” the previous classification and learn the new one. But then, at the next round, G will again swap the locations, and bring the situation back to the starting point, and so on, without ever reaching an equilibrium. Note that a possible equilibrium point for G would be the mixed strategy that reports a for both Alice and Bob4 (so that C could only make a bling guess), but G may not stop there. The problem is that it is difficult to calibrate the training of G so that it stops in proximity of the saddle point rather than continuing all the way to reach its relative optimum. The situation is illustrated in table 4.2a. In order to address this issue we adopt a different target function, less sensitive to the particular labeling strategy of C . The idea is to consider not just the precision of the classi-fication, but, rather, the information contained in it.

READ Electrochemical Characterization and detection of isoproturon at mesoporous silica modified and unmodified glassy carbon electrodes

Table of contents :

1 Introduction
1.1 Motivation
1.2 Leakage estimation
1.3 Privacy mechanisms design
1.4 Goals
1.5 Plan of the thesis and contributions
2 Preliminaries
2.1 Quantitative information flow notions: g-leakage and g-vulnerability
2.2 Information theory notions
2.3 Brief review of machine learning notions
3 Estimating g-Leakage via Machine Learning
3.1 Learning g-vulnerability: Statistical Bounds
3.1.1 Main definitions
3.1.2 Principle of the empirical g-vulnerability maximization
3.1.3 Distribution-free bounds on the estimation accuracy
3.1.4 Sample complexity
3.2 From g-vulnerability to Bayes vulnerability via pre-processing
3.2.1 Data pre-processing
3.2.2 Channel pre-processing
3.2.3 Data and channel pre-processing: pros and cons
3.3 Evaluation
3.3.1 Representation of the results and metrics
3.3.2 Learning algorithms
3.3.3 Frequentist approach
3.3.4 Experiment 1: multiple guesses
3.3.5 Experiment 2: location privacy
3.3.6 Experiment 3: differential privacy
3.3.7 Experiment 4: password checker
3.4 Technical details
3.5 Final remarks
4 Privacy protection mechanisms design via machine learning
4.1 Game theoretic problem description
4.2 Our setting
4.2.1 Quantifying utility
4.2.2 Quantifying privacy as mutual information
4.2.3 Formulation of the game
4.2.4 Measuring privacy as Bayes error
4.3 Proposed method
4.3.1 Mutual information vs cross entropy
4.3.2 Mutual information: implementation
4.3.3 Utility
4.3.4 On the convergence of our method
4.4 Cross Entropy vs Mutual Information: experiments on synthetic data
4.4.1 Experiment 1: relaxed utility constraint
4.4.2 Experiment 2: stricter utility constraint
4.5 Experiments on the Gowalla dataset
4.5.1 Experiment 3: relaxed utility constraint
4.5.2 Experiment 4: stricter utility constraint
4.6 Final remarks
5 Feature selection in machine learning: Rényi min-entropy vs Shannon entropy
5.1 Problem definition
5.2 Proposed algorithm
5.2.1 Example: Rényi min-entropy outperforms Shannon entropy
5.2.2 Example: Shannon entropy may outperform Rényi min-entropy
5.3 Evaluation
5.4 Final remarks
6 Conclusion
References