Get Complete Project Material File(s) Now! »

## Proving convergence: Strong Eventual Consistency

Figure 2.4 lists the sufficient conditions for a highly-available distributed application to ensure Strong Eventual Consistency (SEC). To simplify the notation, we assume that when an operation executes, the local state respects the precondition of that operation.

Let us now consider these conditions in turn:

The first condition Poset checks that the ordering relation of the state defines a partially ordered set(poset): reflexive, transitive, and anti-symmetric.

We then find a number of conditions on the merge function.

The second condition, Total, says that the merge function is total.

Conditions Idempotent, Commutative and Associative say that the merge function is idempotent, commutative and associative [4].

Condition Inflation says that each operation op of the application is an inflation.

Related to the condition above, condition UB ensures that the result of merge is an upper-bound of the input states. This, along with condition Inflation, is asufficient condition for convergence, since it implies that there is a deterministic way to reconcile any two replicas that have diverged in their states through the leastupper- bound of the lattice implemented by the merge function, and also implies that the states of all replicas are progressing in the same direction (w.r.t. the ordering function) in the lattice (see Figure 2.8). It remains to see that there is a deterministic state to which all replicas will converge (assuming that no new operations arrive).

The fourth and final condition, LUB, ensures that merge function is the least upper bound as per the given order. This condition guarantees that the state reached by merging multiple states is unique, making the merge function deterministic, and thus guaranteeing equality at the point where all replicas have exchanged their respective states.

All these conditions ensure that the distributed application guarantees strong eventual consistency in the case where all the replicas receive copies of states incorporating all prior updates.

### Semilattice of auction object

Let us show that our running example of an auction application converges. Subsection 2.2.1 represent the ordering relation as a semilattice following Figure 1.1. It is not hard to see that each of our operations is an inflation, and that the merge operation computes the least-upper-bound.

#### Proving Invariants

In this section, we report our invariant verification strategy. Specifically, we consider the problem of verifying data invariants of highly-available distributed applications. To support the verification of data invariants we will consider a syntactic-driven approach based on program logic. Bailis et al.[14] identify necessary and sufficient run-time conditions to establish the safety of application invariants for highly-available distributed databases in a criterion dubbed I-confluence. Moreover, they study the validity of a number of typical invariants and applications. Our work improves on the I-confluence criterion defined in [14] by providing a static, syntax-driven, and mostly-automatic mechanism to verify the correctness of an invariant for an application. We will address the specific differences in Chapter 5, related work.

An important consequence of our verification strategy is that while we are proving invariants about a concurrent highly-distributed system, our verification conditions are modular (on the number of API operations), and can be carried out using standard sequential Hoare-style reasoning. These verification conditions in turn entail stability of the assertions as one would have in a logic like Rely/Guarantee.

Let us start by assuming that a given initial state for the application is denoted i. Initially, all replicas have i as their local state. As explained earlier, each replica executes a sequence of state transitions, due either to a local update or to a merge incorporating remote updates. Let us call safe state a replica state that satisfies the invariant. Assuming the current state is safe, any update (local or merge) must result in a safe state. To ensure this, every update is equipped with a precondition that disallows any unsafe execution.3 Formally, an update u (an operation or a merge), mutates the local state , to a new state new = u(). To preserve the data invariant, Invdata, we require that the local state respects the precondition of the update, Preu: 2 Preu =) u() 2 Invdata Thus, a local update executes only when, at the origin replica, the current state is safe and its precondition currently holds.

Similarly, merge must also be safe. Since merge can happen at any time, it must be the case that its precondition is always true, i.e., it constitutes an additional invariant. We call this the concurrency invariant. Now our global invariant consists of two parts: first, the data invariant (Invdata), and second, the concurrency invariant(Invconc).

To illustrate local preconditions, consider an operation close_auction(w:BidId), which sets auction status to CLOSED and the winner to w (of type BidId). The developer may have written a precondition such as status = ACTIVE because closing an auction doesn’t make sense otherwise. In order to ensure the invariant that the winner has the highest amount, one needs to strengthen it with the clause is_highest(Bids, w), defined as4 8 b 2 Bids , b. placed =) b. Amount w.

**Coordination for Invariant Preservation**

As we discussed earlier, the preconditions of operations and merge are strengthened in order to be sequentially safe. An application must also preserve the concurrency invariant in order to ensure concurrent safety. Violating this indicates the presence of a concurrency bug in the specification. In that case, the operations that fail to preserve the concurrency invariant might need to coordinate. The developer adds the required concurrency control mechanisms as part of the state in our model. The modified state is now composed of the state and the coordination mechanism. Recall that in the auction example, placing bids and closing the auction did not preserve the precondition of merge. This requires strengthening the specification by adding a coordination mechanism to restrict these operations. We can enforce them to be strictly sequential, thereby avoiding any concurrency at all. But this will affect the availability of the application. In particular, it should be possible to place bids in parallel.

A concurrency control can be better designed with the workload characteristics in mind. For this particular use case, we know that placing bids is a much more frequent operation than closing an auction. Concurrent placing of bids is safe, whereas concurrency between place bid and close auction is not. This situation is similar to a readers-writer lock. We distribute tokens to each replica. As long as a replica has a token, it can place bids. Closing the auction requires recalling the tokens from all replicas. This ensures that there are no bids placed concurrently while closing auction and thus a winner can be declared, respecting the invariant. The addition of this concurrency control also updates the Invconc. Clearly, all operations must respect this modification for the specification to be considered safe.

Note that the token model described here restricts availability in order to ensure safety. Adding efficient coordination is not a problem to be solved only with application specification in hand, it rather requires the knowledge of the application dynamics such as the workload characteristics and is part of our work described in Part III.

**Properties and associated proof rules**

Consider some data structure (in this case a tree) characterized by a safety invariant (in this case, the tree invariant). We say that a state is local-safe if it satisfies the data structure’s invariant. An update is op-safe if, starting from a local-safe state, it leaves it a local-safe state. The distributed data structure is safe if every update is op-safe. According to the CISE logic [6], a distributed data structure is safe if the following properties hold:

1. Sequential safety: Consider an environment restricted to sequential execution (operations execute one after another; there is no concurrency). If the initial state is local-safe at every replica, and each update is op-safe, it follows that the data structure is safe under sequential execution. Classically, sequential op-safety implies that each operation’s precondition satisfies the weakest-precondition of the invariant with respect to the operation [31].

2. Convergence: Strong Eventual Consistency (SEC) [2] states that two replicas that have delivered the same set of operations must be in the same state, i.e., the system converges. If operations commute (as defined later), then SEC is guaranteed [2].

3. Precondition stability: In addition to sequential safety, updates must remain op-safe in the presence of concurrent (uncoordinated) updates. To ensure this, we apply the CISE precondition stability rule [6]: consider two updates u and v; if the execution of u does not make the precondition of v false, nor vice-versa (precondition stability), then executing u and v concurrently is op-safe. This must be true for all concurrent pairs of operations.

CISE logic helps us identify the conditions under which concurrent operations conflict. When conflicting, CISE requires the operations to acquire tokens, that bring in a global synchronization point. Hence all updates in CISE are assumed to be definitive.

In order to augment the CISE analysis for handling tentative updates, we add a condition for independence to check whether skipping a move affects a move that already observed the effect of the skipped one. The independency analysis is inspired from Houshmand and Lesani [9], even though they also, like CISE, do not consider tentative updates. Independence analysis: Consider two updates u and v that are safe, u executed before v. If moving v before u still maintains the safety of v, v is said to be independent of u.

**Mechanized verification of the sequential specification**

Following the formalization of the tree data structure above, we use Why3 to mechanically prove its sequential safety. The mechanical proof requires some extra definitions and axioms.

We need a predicate for reachability. For this, we first define a path, a sequence of nodes related by the parent relation. We use s[n] to indicate the nth element in the sequence s. We denote the set of possible sequences of nodes by S. The path predicate determines the validity conditions for a path s between nodes x and y in state . If x = y, the path has length zero. Otherwise, the length of the path is greater than zero, where the first path element must be x, all contiguous path elements are related by the parent relation, and node y is the parent of the last path element. We say y is reachable from x if there exists a path from x to y. Formally, path(; x; y; s) , length(s) = 0 ^ x = y.

**Table of contents :**

List of Tables

List of Figures

**1 Introduction **

1.1 Contributions

1.1.1 Proving invariant safety for highly-available distributed applications

1.1.2 A safe, convergent and highly-available replicated tree

1.1.3 Trade-offs in distributed concurrency control

**I Verifying the design of distributed applications **

**Introduction to Part I **

**2 Proving invariant safety for highly-available distributed applications **

2.1 System Model

2.1.1 General Principles

2.1.2 Notations and Assumptions

2.1.3 Operational Semantics

2.1.4 Operational Semantics with State History

2.1.5 Correspondence between the semantics

2.2 Proving convergence: Strong Eventual Consistency

2.2.1 Semilattice of auction object

2.3 Proving Invariants

2.3.1 Invariance Conditions

2.3.2 Applying the proof rule

2.3.3 Coordination for Invariant Preservation

**3 Use cases **

3.1 Distributed Barrier

3.2 Replicated lock

3.3 Courseware

3.3.1 Pseudocode of courseware

**4 Automation **

4.1 Specifying a distributed application in Soteria

4.2 Verification passes

4.2.1 Syntax check

4.2.2 Convergence check

4.2.3 Safety check

4.3 Tool evaluation

**5 Related work **

**6 Conclusion of Part I and Future work **

6.1 Future work

**II Designing conflict resolution policies **

**Introduction to Part II **

**7 Design of a safe, convergent and coordination free replicated tree **

7.1 System Model

7.2 Properties and associated proof rules

7.2.1 Sequential safety

7.2.2 Concurrency

7.2.2.1 Convergence

7.2.2.2 Precondition stability

7.2.2.3 Independence

7.2.3 Mechanized verification

7.3 Sequential specification of a tree

7.3.1 State

7.3.2 Invariant

7.3.3 Operations

7.3.4 Mechanized verification of the sequential specification

7.4 Concurrent tree specification

7.4.1 Precondition stability

7.4.1.1 Stability of add operation

7.4.1.2 Stability of remove operation

7.4.1.3 Stability of move operation

7.5 Safety of concurrent moves

7.5.1 Classifying moves

7.5.2 Coordination-free conflict resolution for concurrent moves

7.6 Convergence

7.7 Independence

7.7.1 Independence of add operation

7.7.2 Independence of remove operation

7.7.3 Independence of up-move operation

7.7.4 Independence of down-move operation

7.8 Safe specification of a replicated tree

7.8.1 Mechanized verification of the concurrent specification

7.9 Discussion

7.9.1 Moving from causal consistency to eventual consistency

7.9.2 Message overhead for conflict resolution

7.9.3 Computing the set of concurrent moves

**8 Experimental study and Comparison **

**9 Related work **

**10 Conclusion of Part II and Future work **

10.1 Future work

**III Selecting Distributed Concurrency Control **

**Introduction to Part III **

**11 Exploring the coordination lattice **

11.1 System Model

11.1.1 Application model

11.1.2 Network model

11.1.3 Workload characteristics

11.2 Dimensions of Concurrency control

11.2.1 Granularity

11.2.2 Mode

11.2.3 Lock Placement

11.3 The Coordination Lattice

11.4 Navigating the coordination lattice

11.4.1 Granularity selection

11.4.2 Mode selection

11.4.3 Placement selection

**12 Experiments **

12.1 Experimental setup

12.1.1 Inputs

12.1.2 Intermediate processes

12.1.3 DisLockSim – A simulation model for distributed lock

12.1.4 Cost of locking

12.2 Analysing some conflict graphs

12.2.1 Conflict graph involving two operations

12.2.2 Conflict graph involving three operations

**13 Related work **

**14 Conclusion of Part III and Future work **

14.1 Future work