Back-up Plan for the Gateway Architecture

Get Complete Project Material File(s) Now! »

Back-up Plan for the Gateway Architecture

In our scenario, dierent users store data of varying importance on a single gateway. We cannot make assumptions on what will be stored, nor on how often it will be changed. Even user input about these characteristics might be imprecise since users may miss-categorize their data, resulting in possible data loss. In consequence, and also to provide a seamless service, this work considers the following case:
all user data is valuable
a high back-up frequency is required
we keep old back-ups as long as possible
Since the storage space provided by other gateways is limited, we need to be able to free storage of old back-ups. This needs to be done whenever a more recent back-up cannot be stored anymore. Back-up everything and snapshotbased back-up both support cheap deletion of previous back-ups. Of these two, however, only snapshot-based back-up can take advantage of data already used by previous back-ups. In this case, a high back-up frequency only marginally increases the amount of required storage space. In consequence, we conclude with the following recommendation for a back-up plan in our scenario:
We target a snapshot-based back-up solution
We store on-site back-ups with high frequency (at least once a day)
We store o-site back-ups less often (e.g., once a day)
We decrease the frequency for older back-ups (e.g., after one week, we only keep weekly back-ups)
Finally, we keep older back-ups as long as storage space is available

Focus and Contribution of this Thesis

In this work we focus on the feasibility of a distributed back-up system that supports snapshot-based back-ups. This entails several challenges we need to meet, such as the system’s scalability, its resistence to failures and malicious behaviour, together with an adequate data placement strategy. Since we store data on participating gateways, we need to cope with storage that is unreliable in terms of availability and requires additional measures to achieve data condentiality.
The contribution of this work is as follows:
1. We provide a proof of concept for a distributed storage system with support for snapshot-based back-ups. This also includes the management and enforcement of storage space quotas.
2. We introduce a swarm based architecture that uses le level access and reduces metadata to a very low level. Further, it is easy to monitor the data stored in a swarm.
3. We provide solutions for transferring and storing les of dierent sizes eciently in such a distributed system.
4. We illustrate how to use state-of-the-art technologies in order to include a central instance that coordinates data placement in the network. This central instance is highly scalable, fault-tolerant, and replaceable in case it disappears. In addition, it is only exposed to a moderate operational load.
5. We show how to manage encryption keys so that user data can be stored condentially and participants can be authenticated.
6. We analyze the applicability of our system by using real world availability traces. Further, we study the impact of system parameters and provide a comparison to the performance of cloud services.

Organization of this Thesis

This thesis is structured as follows:
Subsequent to this introduction, we present related work in Chapter 2, which includes established techniques used in order to achieve fault-tolerance in storage systems. Further we provide an overview of existing storage systems. In Chapter 3 we introduce our swarm-based architecture, which includes considerations as to data placement, data encryption, and how we cope with data loss due to participants leaving the federated network. Subsequently, in Chapter 4 we supply more details concerning our implementation.
We explain how we communicate between participants and how we create back-ups. We further outline the functionality of the central instance. We analyze the underlying failure model in Chapter 5 by using statistical tests and simulations based on a failure trace. This is accompanied by a general discussion about correlated failures. We simulate the creation of back-ups in Chapter 6. In particular, this includes simulations concerning the time required to create a back-up and the in uence of system parameters.
Finally, we conclude the thesis in Chapter 7 and provide an outlook for future work.

READ SCALING KNOWLEDGE ACCESS TO MULTIPLE DOCUMENTS IN WIKIPEDIA

Table of contents :

1 Introduction
1.1 Motivation
1.1.1 The Need for Back-up
1.1.2 Why Not Use the Cloud for Back-Up?
1.2 Gateway-Based Federated Network
1.3 Back-up Plan
1.3.1 Back-up Strategies
1.3.2 Frequency of Back-up
1.3.3 Back-up Location
1.3.4 Back-up Plan for the Gateway Architecture
1.4 Focus and Contribution of this Thesis
1.5 Organization of this Thesis
2 Related Work
2.1 Introduction
2.2 Redundancy Strategies
2.2.1 Replication
2.2.2 Erasure Coding
2.3 Repair Strategies
2.4 Storage vs. Back-up
2.5 Existing Systems
3 Swarm Architecture
3.1 Introduction
3.2 Swarm Architecture Overview
3.2.1 Gateway
3.2.2 Swarm
3.2.3 Tracker
3.3 Snapshot Representation
3.3.1 On-Site Snapshot Representation
3.3.2 O-Site Snapshot Representation
3.4 Data Management Strategy
3.4.1 Data Placement Policy
3.4.2 Swarms as Distributed Key-Value Stores
3.4.3 Data Kept on the Tracker
3.4.4 Implications
3.5 Maintenance Procedure
3.5.1 Failure Detection
3.5.2 Repair
3.6 In uence of the Number of Original Fragments
3.6.1 Storage Overhead
3.6.2 Data Rates
3.6.3 Bandwidth Saturation
3.6.4 Eect of Correlated Failures
3.6.5 Load on the Tracker
3.7 Encryption
3.7.1 Authentication
3.7.2 Data Encryption
3.7.3 Integrity Checks
3.8 Conclusion
4 Implementation
4.1 Introduction
4.2 Communication
4.2.1 RESTful Architecture
4.2.2 Aggregates
4.2.3 Partial Transfers
4.3 Swarm Leader
4.3.1 Modules for On-Site Back-Up
4.3.2 Modules for O-Site Back-Up
4.3.3 Dierent Ways of Storing Files in a Swarm
4.4 Tracker
4.4.1 Resolution of Gateway Identiers
4.4.2 Internal Tracker Structure
4.4.3 Total Tracker Outage
4.5 Storage Node
4.5.1 Storing Transmission Blocks
4.5.2 Storage Reclamation
4.6 Incentives
4.7 Conclusion
5 Impact of Correlated Failures
5.1 Introduction
5.2 Suitability of the Markovian Assumption
5.2.1 Real World Traces
5.2.2 Traces Matching Our Environment
5.2.3 Independence of Permanent Failure Events
5.2.4 Exponential Distribution of Permanent Failures
5.3 Discussion
5.3.1 Geographically-Related Correlated Failures
5.3.2 Geographically-Diverse Correlated Failures
5.4 Testing Back-Up Durability
5.4.1 Experimental Setup
5.4.2 Results and Conclusion
6 Back-Up Simulation
6.1 Introduction
6.2 Time Required for Back-Up Creation
6.2.1 Common Back-Up Scenario
6.2.2 Cloud-Based Back-Up
6.2.3 Swarm-Based Back-Up
6.2.4 Conclusion
6.3 Inuence of the Timeout Period
6.3.1 Costs Separated into Two Components
6.3.2 Optimal Value for the Free-Traces
6.4 Visualization of Bandwidth Usage
6.4.1 Simulation Setup
6.4.2 Results
6.4.3 Conclusion
7 Conclusion and Perspective
7.1 Conclusion
7.2 Perspective and Future Work
A Synthese en francais
B Additional Implementation Details
B.1 Maintenance Module
B.2 Snapshot Creator Module
C Glossary
C.1 Acronyms and Abbreviations
List of Figures
List of Tables
Bibliography