The need of formal veri cation
As explained in previous sections, internet technologies play an important role in our lives. However, Internet is not the only kind of technology we are in contact with: Every day we interact with embedded systems such as mobile phones, smart cards, GPS receivers, videogame consoles, digital cameras, DVD players, etc. Technology also plays an important role in critical-life systems, i.e., systems where the malfunction of any component may incur in life losses. Example of such systems can be found in the areas of medicine, aeronautics, nuclear energy generation, etc.
The malfunction of a technological device can have important negative consequences ranging from material to life loss. In the following we list some famous examples of disasters caused by software failure.
Material loss: In 2004, the Air Traf-c Control Center of Los Angeles Inter-national Airport lost communication with Airplanes causing the immediate suspen-sion of all operations. The failure in the radio system was due to a 32-bit countdown timer that decremented every millisecond. Due to a bug in the software, when the counter reached zero the system shut down unexpectedly. This communication outage disrupted about 600 ights (in-cluding 150 cancellations) impacting over 30.000 passengers and causing millionaire losses to airway companies involved.
In 1996, an Ariane 5 rocket launched by the European Space Agency exploded just forty seconds after lift-o . The rocket was on its rst voyage, after a decade of development costing U$S 7 billion. The destroyed rocket and its cargo were valued at U$S 500 million. A board of inquiry inves-tigated the causes of the explosion and in two weeks issued a report. It
turned out that the cause of the failure was a software error in the inertial reference system. Speci cally a 64 bit oating point number related to the horizontal velocity of the rocket was converted to a 16 bit signed integer.
In the early nineties a bug (discovered by a professor of Lynchburg Col-lege, USA) in the oating-point division unit of the processor Intel Pentium II not only severely damaged Intel’s reputation, but it also forced the re-placement of faulty processors causing a loss of 475 million US dollars for the company.
Fatal loss: A software aw in the con-trol part of the radiation therapy machine Therac-25 caused the death of six cancer patients between 1985 and 1987 as they were exposed to an overdose of radiation.
In 1995 the American Airlines Flight 965 connecting Miami and Cali crashed just ve minutes before its scheduled ar-rival. The accident led to a total of 159 deaths. Paris Kanellakis, a well known re-searcher (creator of the partition re ne-ment algorithm, broadly used to verify bisimulation), was in the ight together with his family. Investigations concluded that the accident was originated by a sudden turn of the aircraft caused by the autopilot after an instruction of one of the pilots: the pilot input ‘R’ in the navigational computer referring to a location called ‘Rozo’ but the computer erroneously interpreted it as a location called ‘Romeo’ (due to the spelling similarity and physical proximity of the locations).
As the use and complexity of technological devices grow quickly, mech-anisms to improve their correctness have become unavoidable. But, how can we be sure of the correctness of such technologies, with thousands (and sometimes, millions) of components interacting in complex ways? One pos-sible answer is by using formal veri cation, a branch of formal methods.
Formal veri cation
Formal veri cation is considered a fundamental area of study in computer science. In the context of hardware and software systems, formal veri ca-tion is the act of proving or disproving the correctness of the system with respect to a certain property, using formal methods. In order to achieve this, it is necessary to construct a mathematical model describing all pos-sible behaviors of the system. In addition, the property must be formally speci ed avoiding, in this way, possible ambiguities.
Important formal veri cation techniques include theorem proving, sim-ulation, testing, and model checking. In this thesis we focus on the use of this last technique.
Model checking Model checking is an automated veri cation technique that, given a nite model of the system and a formal property, systemati-cally checks whether the property holds in the model or not. In addition, if the property is falsi ed, debugging information is provided in the form of a counterexample. This situation is represented in Figure 1.3.
Usual properties that can be veri ed are \Can the system reach a dead-lock state? », or \Every sent message is received with probability at least 0.99? ». Such automated veri cation is carried on by a so-called model checker, an algorithm that exhaustively searches the space state of the model looking for states violating the (correctness) property.
A major strength of model checking is the capability of generating counterexamples which provide diagnostic information in case the prop-erty is violated. Edmund M. Clarke, one of the pioneers of Model Check-ing said [Cla08]: \It is impossible to overestimate the importance of the counterexample feature. The counterexamples are invaluable in debugging complex systems. Some people use model checking just for this feature ». In case a state violating the property under consideration is encountered, the model checker provides a counterexample describing a possible execution that leads from the initial state of the system to a violating state.
Other important advantages of model checking are: it is highly au-tomatic so it requires little interaction and knowledge of designers, it is rather fast, it can be applied to a large range of problems, it allows partial speci cations.
The main disadvantage of model checking is that the space state of cer-tain systems, for instance distributed systems, can be rather large, thus making the veri cations ine cient and in some cases even unfeasible (be-cause of memory limitations). This problem is known as the state explosion problem. Many techniques to alleviate it have been proposed since the in-vention of model checking. Among the most popular ones we mention the use Binary Decision Diagrams (BDDs), partial order reduction, abstrac-tion, compositional reasoning, and symmetry reduction. State-of-the-art model checkers can easily handle up to 109 states with explicit state rep-resentation. For certain speci c problems, more dedicated data structures (like BDDs) can be used thus making it possible to handle even up to 10476 states.
The popularity of model checking has grown considerably since its invention at the beginning of the 80s. Nowadays, model checking techniques are employed by most or all leading hardware companies (e.g. INTEL, IBM and MOTOROLA – just to mention few of them). While model checking is applied less frequently by software developing companies, there have been several cases in which it has helped to detect previously unknown defects in real-world software. A prominent example is the result of re-search in Microsoft’s SLAM project in which several formal techniques were used to automatically detect aws in device drivers. In 2006, Mi-crosoft released the Static Driver Veri er as part of Windows Vista, SDV uses the SLAM software-model-checking engine to detect cases in which de-vice drivers linked to Vista violate one of a set of interface rules. Thus SDV helps uncover defects in device drivers, a primary source of software bugs in Microsoft applications. Investigations have shown that model checking procedures would have revealed the exposed defects in, e.g., Intels Pentium II processor and the Therac-25 therapy radiation machine. Focus of this thesis This thesis addresses the foundational aspects of formal methods for applications in security and in particular in anonymity: We investigate various issues that have arisen in the area of anonymity, we develop frameworks for the speci cation of anonymity properties, and we propose algorithms for their veri cation.
In this section we give a brief overview of the various approaches to the foundations of anonymity that have been explored in the literature. We will focus on anonymity properties, although the concepts and techniques developed for anonymity apply to a large extent also to neighbor topics like information ow, secrecy, privacy. The common denominator of these problems is the prevention of the leakage of information. More precisely, we are concerned with situations in which there are certain values (data, identities, actions, etc) that are intended to be secret, and we want to ensure that an adversary will not be able to infer the secret values from the information which is publicly available. Some researchers use the term information hiding to refer to this class of problems [HO05].
The frameworks for reasoning about anonymity can be classi ed into two main categories: the possibilistic approaches, and the probabilistic (or quantitative) ones.
The term \possibilistic » refers to the fact that we do not consider quan-titative aspects. More precisely, anonymity is formulated in terms of the possibility or inferring some secrets, without worrying about \how likely » this is, or \how much » we narrow down the secret.
These approaches have been widely explored in the literature, using di erent conceptual frameworks. Examples include the proposals based on epistemic logic ([SS99, HO05]), on \function views » ([HS04]), and on process equivalences (see for instance [SS96, RS01]). In the following we will focus on the latter kind.
In general, possibilistic anonymity means that the observables do not identify a unique culprit. Often this property relies on nondeterminism: for each culprit, the system should be able to produce alternative executions with di erent observables, so that in turn for a given observable there are many agents that could be the culprit. More precisely, in its strongest version this property can be expressed as follows: if in one computation the identity of the culprit is i and the observable outcome is o, then for every other agent j there must be a computation where, with culprit j, the observable is still o.
This kind of approach can be applied also in case of systems that use ran-domization. The way this is done is by abstracting the probabilistic choices into nondeterministic ones. See for example the Dining Cryptographers ex-ample in [SS96], where the coin tossing is represented by a nondeterministic process.
In general the possibilistic approaches have the advantages of simplicity an e ciency. On the negative side, they lack precision, and in some cases the approximation can be rather loose. This is because every scenario that has a non-null probability is interpreted as possible. For instance, consider the case in which a system reveals the culprit 90 percent of the times by outputting his identity, while in the remaining 10 percent of the times it outputs the name of some other agent. The system would not look very anonymous. Yet, the possibilistic de nition of anonymity would be satis ed because all users would appear as possible culprits to the observer regardless of the output of the system. In general, in the possibilistic approach the strongest notion of anonymity we can express is possible innocence, which is satis ed when no agent appear to be the culprit for sure: there is always the possibility that he is innocent (no matter how unlikely it is).
In this thesis we consider only the probabilistic approaches. Their com-mon feature is that they deal with probabilities in a concrete way and they are, therefore, much more precise. They have become very popular in re-cent times, and there has been a lot of work dedicated to understanding and formalizing the notion in a rigorous way. In the next section we give a brief overview of these e orts.
These approaches take probabilities into account, and are based on the likelihood that an agent is the culprit, for a given observable. One notion of probabilistic anonymity which has been thoroughly investigated in the literature is strong anonymity.
Strong anonymity Intuitively the idea behind this notion is that the ob-servables should not allow to infer any (quantitative) information about the identity of the culprit. The corresponding notion in the eld of information ow is (quantitative) non-interference.
Once we try to formalize more precisely the above notion we discover however that there are various possibilities. Correspondingly, there have been various proposals. We recall here the three most prominent ones.
1. Equality of the a posteriori probabilities for di erent culprits. The idea is to consider a system strongly anonymous if, given an observable o, the a posteriori probability that the identity of the culprit is i, P(ijo), is the same as the a posteriori probability of any other identity j. Formally:
P(ijo) = P(jjo) for all observables o, and all identities i and j (1.1)
This is the spirit of the de nition of strong anonymity by Halpern and O’Neill [HO05], although their formalization involves more sophisti-cated epistemic notions.
2. Equality of the a posteriori and a priori probabilities for the same culprit. Here the idea is to consider a system strongly anonymous if, for any observable, the a posteriori probability that the culprit is a certain agent i is the same as its a priori probability. In other words, the observation does not increase or decrease the support for suspecting a certain agent. Formally:
P(ijo) = P(i) for all observables o, and all identities i (1.2)
This is the de nition of anonymity adopted by Chaum in [Cha88]. He also proved that the Dining Cryptographers satisfy this property if the coins are fair. Halpern and O’Neill consider a similar property in their epistemological setting, and they call it conditional anonymity [HO05].
3. Equality of the likelihood of di erent culprits. In this third de nition a system is strongly anonymous if, for any observable o and agent i, the likelihood of i being the culprit, namely P(oji), is the same as the likelihood of any other agent j. Formally:
P(oji) = P(ojj) for all observables o, and all identities i and j (1.3)
This was proposed as de nition of strong anonymity by Bhargava and Palamidessi [BP05].
In [BCPP08] it has been proved that de nitions (1.2) and (1.3) are equivalent. De nition (1.3) has the advantage that it does extend in a nat-ural way to the case in which the choice of the culprit is nondeterministic.
This could be useful when we do not know the a priori distribution of the culprit, or when we want to abstract from it (for instance because we are interested in the worst case).
Concerning De nition (1.1), it probably looks at rst sight the most natural, but it actually turns out to be way too strong. In fact it is equiv-alent to (1.2) and (1.3), plus the following condition:
P(i) = P(j) for all identities i and j (1.4) namely the condition that the a priori distribution be uniform.
It is interesting to notice that (1.1) can be split in two orthogonal prop-erties: (1.3), which depends only in the protocol, and (1.4), which depends only in the distribution on the secrets.
Unfortunately all the strong anonymity properties discussed above are too strong, almost never achievable in practice. Hence researches have started exploring weaker notions. One of the most renowned properties of this kind (among the \simple » ones based on conditional probabilities) is that of probable innocence.
Probable innocence The notion of probable innocence was formulated by Rubin and Reiter in the context of their work on the Crowds protocol [RR98]. Intuitively the idea is that, after the observation, no agent is more likely to be the culprit than not to be. Formally:
P(ijo) P(:ijo) for all observations o, and all identities i or equivalently
P(ijo) 1 for all observations o, and all identities i
In [RR98] Rubin and Reiter proved that the Crowds protocol satis es prob-able innocence under a certain assumption on the number of attackers rel-atively to the number of honest users.
All the notions discussed above are rather coarse, in the sense that they are cut-o notions and do not allow to represent small variations in the degree of anonymity. In order to be able to compare di erent protocols in a more precise way, researcher have started exploring settings to measure the degree of anonymity. The most popular of these approaches are those based in information theory.
The underlying idea is that anonymity systems are interpreted as channels in the information-theoretic sense. The input values are the possible iden-tities of the culprit, which, associated to a probability distribution, form a random variable Id. The outputs are the observables, and the transition matrix consists of the conditional probabilities of the form P(oji), repre-senting the probability that the system produces an observable o when the culprit is i. A central notion here is the Shannon entropy, which represents the uncertainty of a random variable. For the culprit’s possible identity, this is given by: H(Id) = P(i) log P(i) (uncertainty a priori)
Note that Id and the matrix also determine a probability distribution on the observables, which can then be seen as another random variable Ob. The conditional entropy H(IdjOb), representing the uncertainty about the identity of the culprit after the observation, is given by
H(IdjOb) = P(o) P(ijo) log P(ijo) (uncertainty a posteriori)
It can be shown that 0 H(IdjOb) H(Id). We have H(IdjOb) = 0 when there is no uncertainty left about Id after the value of Ob is known. Namely, when the value of Ob completely determines the value of Id. This is the case of maximum leakage. At the other extreme, we have H(IdjOb) = H(Id) when Ob gives no information about Id, i.e. when Ob and Id are independent.
Table of contents :
1.1.1 The relevance of anonymity nowadays
1.1.2 Anonymizing technologies nowadays
1.1.3 Anonymizing technologies: a bit of history
1.1.4 Anonymity and computer science
1.2 Formal methods
1.2.1 The need of formal verication
1.2.2 Formal verication
1.4 Contribution and plan of the thesis
1.5 Origins of the Chapters and Credits
2 Conditional Probabilities over Probabilistic and Nondeterministic Systems
2.2 Markov Decision Processes
2.3 Conditional Probabilities over MDPs
2.4 Conditional Probabilistic Temporal Logic
2.5 Semi History-Independent and Deterministic Schedulers
2.5.1 Semi History-Independent Schedulers
2.5.2 Deterministic Schedulers
2.6 Model Checking cpCTL
2.6.1 Model Checking Pa[j ]
2.7 Counterexamples for cpCTL
3 Computing the Leakage of Information Hiding Systems
3.2.1 Probabilistic automata
3.2.2 Noisy Channels
3.2.3 Information leakage
3.3 Information Hiding Systems
3.4 Reachability analysis approach
3.4.1 Complexity Analysis
3.5 The Iterative Approach
3.5.1 Partial matrices
3.5.2 On the computation of partial matrices
3.5.3 Identifying high-leakage sources
3.6 Information Hiding Systems with Variable a Priori
3.7 Interactive Information Hiding Systems
3.8 Related Work
4 Information Hiding in Probabilistic Concurrent Systems
4.2.1 Probabilistic automata
4.2.2 Noisy Channels
4.2.3 Information leakage
4.2.4 Dining Cryptographers
4.3.1 Tagged Probabilistic Automata
4.4 Admissible Schedulers
4.4.1 The screens intuition
4.4.2 The formalization
4.5 Information-hiding properties in presence of nondeterminism
4.5.2 Information leakage
4.5.3 Strong anonymity (revised)
4.6 Verifying strong anonymity: a proof technique based on automorphisms
4.6.1 The proof technique
4.6.2 An Application: Dining Cryptographers
4.7 Related Work
5 Signicant Diagnostic Counterexample Generation
5.2.1 Markov Decision Processes
5.2.3 Markov Chains
5.2.4 Linear Temporal Logic
5.4 Representative Counterexamples, Partitions and Witnesses
5.5 Rails and Torrents
5.6 Signicant Diagnostic Counterexamples
5.7 Computing Counterexamples
5.7.1 Maximizing Schedulers
5.7.2 Computing most indicative torrent-counterexamples
5.7.3 Debugging issues
5.8 Related Work
6 Interactive Systems and Equivalences for Security
6.1 Interactive Information Flow
6.2 Nondeterminism and Information Flow
7.2 Further directions