Robust and Strongly Consistent Hybrid Cloud Storage

Get Complete Project Material File(s) Now! »

Automated declarative consistency verification

Although with Hybris we experimented the design of a storage system, most of real-world engineering entails dealing with off-the-shelf commercial systems. Therefore, a crucial part of software engineering consists in understanding what are the assumptions and guarantees that each component of a given architecture brings about, and how they compose. Unfortunately, those assumptions and guarantees are often poorly expressed — if at all — by the original developers. Moreover, the terminology adopted in their specifications often does not use a standard set of unambiguous concepts, further hindering the integration task. This is especially true when it comes to storage systems and their consistency guarantees. Oftentimes, consistency specifications are in fact expressed in an informal, hence imprecise way; they refer to specific settings and their validity hinges on ad-hoc assumptions. As a result, they are incompatible and incomparable, thus leading to convoluted and error-prone testing techniques.
Researchers have responded to this issue by suggesting different approaches. Some works in the literature proposed and perfected algorithms to establish whether an execution respects strong consistency semantics [191, 122]. Another popular approach consists in quantifying eventual consistency by means of client-side staleness measurements [47, 169]. Finally, transactional storage semantics have been traditionally verified by adopting a graph-based approach [8, 254]. While all these techniques are valid and effective, they fail to encompass the entirety of the consistency spectrum and the composition of the elemental consistency semantics. As third contribution of this thesis, we propose an automated declarative approach to consistency verification. In order to achieve this, we advocate the adoption of the declarative semantic model to express consistency that we introduce in Chapter 2. Such an axiomatic model allows to consider consistency as a mere invariant of storage system executions. In turn, considering consistency as an invariant allows applying advanced verification techniques, such as property-based testing [83]. Hence, we developed Cover, a prototype of a verification framework that implements property-based consistency verification. Conver lets the user select a set of consistency predicates that have to be respected during executions. Then, it automatically generates random executions consisting of concurrent read and write operations, possibly tailored to better verify specific consistency semantics. During each execution, Conver verifies the validity of the chosen consistency predicates and, in case of inconsistency, it outputs a visualization highlighting the sequence of operations that led to the anomaly. Moreover, Conver can inject faults in the storage system by terminating processes and creating network partitions. In summary, Conver relieves programmers from the burden of generating and running accurate tests to verify the correctness of storage systems.

Non-transactional consistency semantics

In this section we analyze and survey the consistency semantics of systems which adopt single operations as their primary operational constituent (i.e. non-transactional consistency semantics). The consistency semantics described in the rest of the chapter are summarized in Figure 2.1, which presents a partial ordering of consistency semantics according to their semantic strength, as well as a more loosely defined clustering into families of consistency models. This classification draws both from the relative strength of different consistency semantics, and from the underlying common factors that underpin their definitions. In Appendix B, we provide proofs for some of the strength relations between consistency semantics that we formally specify in this section. The remaining relations showed in Fig. 2.1 reflect intuitive notions, claims or formal proofs reported in peer-reviewed research literature. In the remainder of this section, we examine each family of consistency semantics. Section 2.3.1 introduces linearizability and other strong consistency models, while in Section 2.3.2 we consider eventual and weak consistency. Next we analyze PRAM and sequential consistency (Section 2.3.3), and, in Section 2.3.4, the models based on the concept of session. Section 2.3.5 presents an overview of consistency semantics explicitly dealing with causality, while in Section 2.3.6 we study staleness-based models. This is followed by an overview of fork-based models (Section 2.3.7). Section 2.3.8 and 2.3.9 respectively deal with tunable and per-object semantics. Finally, we survey the family of consistency models based on synchronization primitives (Section 2.3.10).

Linearizability and related “strong” consistency semantics

The gold standard and the central consistency model for non-transactional systems is linearizability, defined by Herlihy and Wing [138]. Roughly speaking, linearizability is a correctness condition that establishes that each operation shall appear to be applied instantaneously at a certain point in time between its invocation and its response. Linearizability, often informally dubbed strong consistency, 4 has long been regarded as the ideal correctness condition. Linearizability features the locality property: a composition of linearizable objects is itself linearizable — hence, linearizability enables modular design and verification. Although very intuitive to understand, the strong semantics of linearizability makes it challenging to implement. In this regard, Gilbert and Lynch [120], formally proved the CAP theorem, an assertion informally presented in previous works [146, 95, 86, 61], that binds linearizability to the ability of a system of maintaining a non-trivial level of availability when confronted with network partitions. In a nutshell, the CAP theorem states that, in the presence of network partitions, availability or linearizability are mutually incompatible: the distributed storage system must sacrifice one or the other (or both).

Table of contents :

1 Introduction
1.1 A semantic framework to express consistency
1.2 Robust hybrid cloud storage
1.3 Automated declarative consistency verification
1.4 Outline and previously published work
2 Consistency in Non-Transactional Distributed Storage Systems
2.1 Introduction
2.2 System model
2.2.1 Preliminaries
2.2.2 Operations, histories and abstract executions
2.2.3 Replicated data types and return value consistency
2.2.4 Consistency semantics
2.3 Non-transactional consistency semantics
2.3.1 Linearizability and related “strong” consistency semantics
2.3.2 Weak and eventual consistency
2.3.3 PRAM and sequential consistency
2.3.4 Session guarantees
2.3.5 Causal models
2.3.6 Staleness-based models
2.3.7 Fork-based models
2.3.8 Composite and tunable semantics
2.3.9 Per-object semantics
2.3.10 Synchronized models
2.4 Related work
2.5 Summary
3 Robust and Strongly Consistent Hybrid Cloud Storage
3.1 Introduction
3.2 Hybris overview
3.2.1 System model
3.2.2 Hybris data model and semantics
3.3 Hybris Protocol
3.3.1 Overview
3.3.2 put protocol
3.3.3 get in the common case
3.3.4 Garbage collection
3.3.5 get in the worst-case
3.3.6 Transactional put
3.3.7 delete and list
3.3.8 Confidentiality
3.3.9 Erasure coding
3.3.10 Weaker consistency semantics
3.4 Implementation
3.4.1 ZooKeeper-based RMDS
3.4.2 Consul-based RMDS
3.4.3 Cross fault tolerant RMDS
3.4.4 Optimizations
3.5 Evaluation
3.5.1 Experiment 1: common-case latency
3.5.2 Experiment 2: latency under faults
3.5.3 Experiment 3: RMDS performance
3.5.4 Experiment 4: RMDS geo-replication
3.5.5 Experiment 5: caching
3.5.6 Experiment 6: Hybris as personal storage backend
3.5.7 Cost comparison
3.6 Related Work
3.7 Summary
4 Automated Declarative Consistency Verification
4.1 Introduction
4.2 A declarative semantic model
4.3 Property-based consistency verification
4.4 Discussion
4.5 Future Work
4.6 Summary
5 Conclusion
5.1 Towards a cross-stack principled approach to consistency
5.2 Planet-scale consistency and the challenges of composition
5.3 Verification of distributed systems correctness
A Summary of Consistency Predicates
B Proofs of Strength Relations between Consistency Semantics
C Consistency Semantics and Implementations
D Hybris: Proofs and Algorithms
D.1 Hybris protocol
D.2 Correctness proofs
D.3 Alternative proof of Hybris linearizability
E French Summary
E.1 La cohérence dans les systèmes de stockage répartis non transactionnels
E.2 Stockage robuste et fortement cohérent dans le Cloud hybride
E.2.1 Principales caractéristiques d’Hybris
E.2.2 Implémentation et résultats
E.3 Vérification déclarative et automatisée de la cohérence
E.3.1 Principales caractéristiques