Data Concepts and Notation
Different terminology is applied in different works; for instance, fragments are sometimes called chunks, shards, or shares. The following data concepts, notation, and terminology are introduced in order to unify and facilitate the description of relevant works.
Data Concepts The presented fragmentation techniques perform two kind of operations on their input data: they can segment the data into physically separate fragments or they can transform the data (i.e. encrypt or encode). Therefore, data concepts will be divided into two groups corresponding to those two kind of operations.
Concepts connected with data fragmentation:
• Data (D or d): An initial vector of data bits of size |D| bits. When the data is represented as an integer it is denotes as d.
• Fragment (F): The final fragment, a vector of bits of size |F| bits, a result of the data fragmentation.
• Share (SHARE): an intermediary fragment of size |SHARE| bits in the case when the data fragmentation process is performed in two steps.
Concepts connected with data transformation:
• Block (B, P or C): a sequence of |B| bits corresponding to the classical concept of a data block. When referring to a plaintext block, the block is denoted as P and, when referring to a ciphertext, the block is denoted as C. When it is not specified if the block is a plaintext or a ciphertext block, it is referred to as simply B.
• Plaintext (PLAIN): when the data D is designed to be encrypted, it is denoted as plaintext PLAIN composed of p plaintext blocks (the plaintext is already padded if needed).
• Ciphertext (CIPH): encrypted plaintext composed of c ciphertext blocks. The number of blocks inside the ciphertext is equal to c = p+1 as an initialization vector is added at the beginning of the ciphertext.
• Transformed ciphertext (CIPH0): when an additional processing is applied to the ciphertext after the data encryption, CIPH0 denotes the transformed ciphertext.
Bitwise Fragmentation Techniques
During bitwise fragmentation, initial data D is transformed into n fragments F0, . . . , Fn−1 that are later dispersed over n different physical locations over multiple physically separated storage servers inside of a data center or several independent storage providers. Data reconstruction is only possible when a threshold of k of the fragments is reached. Therefore, protection provided by the data fragmentation depends mainly on the two parameters k and n defining the dispersion scope. On the one hand, a value of k close to n makes the fragments gathering harder as it gives less choice to an attacker. In a particular case, when k equals n, all fragments are needed for data recovery and the attacker has to compromise all of the chosen storage sites. On the other hand, a lower value of k increases data availability as it makes data reconstruction feasible even if some of the fragments are lost or corrupted. Therefore, a wise choice of the k and n setting has to take into consideration the characteristics of the environment in which the fragments will be stored (the issue of providing data resilience inside a distributed storage system is described in more details in Section 2.3.2).
Characteristics of bitwise fragmentation systems
The previous section gave an overview of existing techniques providing data protection by means of fragmentation. This section focuses on other aspects intrinsically connected with fragmentation that will be later analyzed in Section Any kind of distributed storage system should ensure data resilience as it has to be prepared for the loss or alteration of a part of its data in case of an attack or an incident. In a system applying fragmentation, the ratio between the total number of fragments (n) and the number of fragments required for the data reconstruction (k) should depend mainly on two factors: the trustworthiness of the storage devices and the estimated longevity of the system. Indeed, data dispersed over unreliable machines (i.e. inside a peerto- peer storage system) are more likely to be lost or altered. At the same time, it is easier to ensure data survival if the longevity of the system is measured in years rather than decades.
Several techniques may be applied to ensure data resilience, like data replication, threshold secret sharing, information dispersal, and systematic error-correction codes. The choice of a suitable technique is not straightforward as multiple factors like the performance of the technique or its impact on the confidentiality and storage requirements have to be taken into account. Replication is the easiest and fastest solution but also quite inefficient in terms of memory occupation. Its main advantage is that it ensures high data availability as the replicas of fragments are immediately ready to be used and no special processing is required in case of data recovery. Among systems presented in Section 2.3.3, it can be found in GridSharing, DepSky (only as an option), and IBM Object Cloud Storage (only for small data, for which the gain in performance prevails over storage blow-up). Threshold schemes, like the previously presented Shamir’s scheme, provide not only data protection but also resilience. However, in this case resilience comes at an extremely high cost as not only does the performance of such schemes drastically decrease with the increasing number of fragments, but their storage overhead is comparable to data replication. Therefore, they are almost exclusively used for the protection of small data, especially encryption keys. One of the rare uses of secret sharing for ensuring the resilience of larger data can be found in the POTSHARDS system, designed to protect archival data for decades.
Information dispersal algorithm adds resilience without leading to an excessive storage overhead. They perform better than threshold schemes but similarly lack of scalability (performance decreases with the increasing number of fragments). Therefore, recent systems use rather systematic errorcorrection codes (especially Reed-Solomon codes [RS60]). The principle of systematic error-correction codes is similar to the one of an information dispersal algorithm (both can be seen as matrix multiplication). However, they allow users to save computations by only generating resilient n−k fragments while keeping k fragments as direct chunks of the data.
Table of contents :
1.1 Background and Motivation
1.2 Contributions and Dissertation Overview
2 Relevant work
2.2 Data Concepts and Notation
2.3 Bitwise Fragmentation
2.3.1 Bitwise Fragmentation Techniques
2.3.2 System characteristics
2.3.3 Systems Overview
2.4 Exploiting data structures
2.4.1 Object-oriented Data Fragmentation
2.4.2 Database Fragmentation
2.5 Issues and Recommendations
3 Protecting data against key exposure
3.1 Introduction and Motivation
3.2 Secure Fragmentation and Dispersal
3.2.1 Data Concepts, Notation, and Prerequisites
3.2.2 Description of the Scheme
3.2.3 Comparison with Relevant Works
3.3 Circular All-Or-Nothing
3.3.1 Data Concepts and Notation
3.3.2 Description of the Algorithm
3.3.3 Comparison with Relevant Works
3.4 Selective All-Or-Nothing
3.4.1 Data Concepts, Notations, and Prerequisite
3.4.2 Description of the Scheme
3.4.3 Comparison with Relevant Works
4 Accelerating fragmentation
4.1 Introduction and Motivations
4.1.1 Data Concepts, Notation, and Prerequisites
4.2 Description of the Scheme
4.2.1 Step 1: Partial Encryption and Fragmentation
4.2.2 Step 2: Blending Plaintext and Ciphertext Blocks
4.2.3 Step 3: Dispersing Fragments
4.3 Comparison With Relevant Works
4.3.1 Theoretical Comparison with Relevant Works
4.3.2 Performance Benchmark
5 Lightweight fragmentation
5.1 Introduction and Motivations
5.2 Data Concepts, Prerequisites, and Notation
5.2.1 Prerequisites and Index Notations
5.3 Forming Fragments
5.3.1 Data Distribution over Fragments
5.3.2 Generating Permutations
5.3.3 Encoding Fragments
5.3.4 Dispersing Fragments
5.4 Security Analysis
5.5 Complexity Analysis and Storage Requirements
5.6 Comparison with Relevant Works
6 Fragmentation inside UWSN
6.1 Introduction and Motivations
6.2 Related Work
6.2.1 Moving Data around the Network
6.2.3 Homomorphic Key-evolution Scheme
6.2.4 Homomorphic Encryption and Homomorphic Secret Sharing
6.3 Problem Formulation
6.3.1 Network Model
6.3.2 Threat Model
6.4 The Proposed Scheme
6.4.1 System Initialization
6.4.2 Processing Round Data
6.4.3 Fragments Aggregation
6.4.4 Data Defragmentation
6.5 Comparison with Relevant Works
6.5.1 Storage Overhead
6.5.2 Transmission Costs
6.5.3 Performance Benchmark
7 Conclusions and Future Work
7.1 Summary of contributions
7.2 Future Work
A Empirical Analysis of FSFA
B Publications and Talks
List of Figures
List of Tables
List of Abbreviations