Differences between Cloud Computing and Grid Computing

Get Complete Project Material File(s) Now! »

Windows Azure Compute

Windows Azure Compute is exposed through hosted services which are deployable to an Azure data center. Each hosted service corresponds to a specific web application and is composed of roles, each of this role corresponding to a logical part of the service. The different roles that can be part of a hosted service are: the web roles, the worker roles and the Virtual Machines (VM) roles. A single hosted service is composed of at least one web role and one worker role. Each role is run on at least one virtual machine, referred to as a role instance or a worker, so a single hosted service requires at least two role instances. We now briefly describe the different roles:
– Web roles are designed for web application programming. Web Roles allow public computers to be connected to the hosted service over standard HTTP and HTTPS ports. VM running on a given web role are pre-configured with IIS7 (Internet Information Services) and specifically designed to run Microsoft web-programming technologies as ASP.NET or Windows Communication Foundation (WCF), but they also support native codes such as PHP or Java to build web applications.
– Worker roles are designed to run general background processes. These processes can be dependent on a web role (handling the computation required by the web role) or independent. One of the differences between web and worker roles is that worker roles don’t come with a pre-installed IIS. Worker roles execution code can be defined using the .NET framework.
– VM roles are designed to provide developers with a much wider scope of possibilities and especially to control the operating system image. VM roles should not be used unless worker and web roles do not fit the developer’s purpose, as it is the case for example when one has long and complicated installations in the operating system or a setup procedure that cannot be automated. In VM roles, the developer will upload his own virtual hard drive (VHD) that holds a custom operating system (more specifically a Windows Server 2008 R2 image) that will be run on the different VM running the VM role. This role will not be used in our cloud algorithm implementations.

Windows Azure Storage

The Windows Azure Storage (WAS) is the storage component of the Windows Azure Cloud Platform. It is a public cloud service available since November 2008 and which presently (in January 2012) holds 70PBytes of data. It is used as an independent storage service but also as the persistence storage for the applications run on Windows Azure Cloud Platform. According to [36], the WAS is used internally by Microsoft for applications such as social networking search, serving video music and game content but also outside Microsoft by thousands of customers.
The WAS has been designed to be highly scalable, so that a single piece of data can be simultaneously accessed by multiple computing instances and so that a single application can persist terabytes of data. For example, the ingestion engine of Bing used to gather and index all the Facebook and Twitter content is reported to store around 350 TBytes of data in Azure (see [36]).
TheWAS provides various forms of permanent storage components with differing purposes and capabilities. The following subsection describes these components.

Elements of internal architecture

The internal implementation of the WAS is rather complex and leverages a multilayer design. This design lets the storage be at the same time strongly consistent, highly available and partition tolerant. Providing at the same time these three properties is a difficult challenge, at least theoretically, due to the CAP theorem. This subsection provides a short insight of the underlying infrastructure and shows how the previous guarantees are achieved despite the difficulty mentioned above. We refer the reader to [36] for a more in-depth presentation of the WAS underlying architecture.

READ Contactless electromagnetic technologies

Table of contents :

Contents
1 Introduction
1.1 Contexte scientifique
1.2 Contexte de la thèse
1.3 Présentation des travaux
1.3.1 Chapitre 2 – Introduction au Cloud Computing
1.3.2 Chapitre 3 – Introduction à Azure
1.3.3 Chapitre 4 – Éléments de conception logicielle sur le cloud
1.3.4 Chapitre 5 – Algorithmes de Batch K-Means répartis
1.3.5 Chapitre 6 – Considérations pratiques pour les algorithmes de Vector Quantization répartis
1.3.6 Chapitre 7 – Implémentation cloud d’un algorithme de Vector Quantization réparti et asynchrone
1.4 Résumé des contributions
2 Presentation of Cloud Computing
2.1 Introduction
2.2 Origins of Cloud Computing
2.2.1 HPC and commodity hardware computing
2.2.2 Grid Computing
2.2.3 Emergence of Cloud Computing
2.3 Cloud design and performance targets
2.3.1 Differences between Cloud Computing and Grid Computing
2.3.2 Everything-as-a-Service (XAAS
2.3.3 Technology stacks
2.4 Cloud Storage level
2.4.1 Relational storage and ACID properties
2.4.2 CAP Theorem and the No-SQL positioning
2.4.3 Cloud Storage Taxonomy
2.5 Cloud Execution Level
2.5.1 MapReduce
2.5.2 GraphLab
2.5.3 Dryad and DryadLINQ
3 Presentation of Azure
3.1 Introduction
3.2 Windows Azure Compute
3.3 Windows Azure Storage
3.3.1 WAS components
3.3.2 Elements of internal architecture
3.3.3 BlobStorage or TableStorage
3.4 Azure Performances
3.4.1 Performance Tradeoffs, Azure Positionning
3.4.2 Benchmarks
3.5 Prices
4 Elements of cloud architectural design pattern
4.1 Introduction
4.2 Communications
4.2.1 Lack of MPI
4.2.2 Azure Storage and the shared memory abstraction
4.2.3 Workers Communication
4.2.4 Azure AppFabric Caching service
4.3 Applications Architecture
4.3.1 Jobs are split into tasks stored in queues
4.3.2 Azure does not provide affinity between workers and storage
4.3.3 Workers are at first task agnostic and stateless
4.4 Scaling and performance
4.4.1 Scaling up or down is a developer initiative
4.4.2 The exact number of available workers is uncertain
4.4.3 The choice of Synchronism versus Asynchronism is about simplicity over performance
4.4.4 Task granularity balances I/O costs with scalability
4.5 Additional design patterns
4.5.1 Idempotence
4.5.2 Queues size usage
4.5.3 Atomicity in the BlobStorage
4.5.4 Lokad-Cloud
4.6 The counter primitive
4.6.1 Motivation
4.6.2 Sharded Counters
4.6.3 BitTreeCounter
5 Distributed Batch K-Means
5.1 Introduction to clustering and distributed Batch K-Means
5.2 Sequential K-Means
5.2.1 Batch K-Means algorithm
5.2.2 Complexity cost
5.3 Distributed K-Means Algorithm on SMP and DMM architectures
5.3.1 Distribution scheme
5.3.2 Communication costs in SMP architectures
5.3.3 Communication costs in DMM architectures
5.3.4 Modeling of real communication costs
5.3.5 Comments
5.3.6 Bandwidth Condition
5.3.7 Dhillon and Modha case study
5.4 Implementing Distributed Batch K-Means on Azure
5.4.1 Recall of some Azure specificities
5.4.2 The cloud Batch K-Means algorithm
5.4.3 Comments on the algorithm
5.4.4 Optimizing the number of processing units
5.5 Experimental results
5.5.1 Azure base performances
5.5.2 The two-step reduce architecture benchmark
5.5.3 Experimental settings
5.5.4 Speedup
5.5.5 Optimal number of processing units and scale-up
5.5.6 Straggler issues
6 Practical implementations of distributed asynchronous vector quantization algorithms
6.1 Introduction
6.2 The sequential Vector Quantization algorithm
6.3 Synthetic functional data
6.3.1 B-spline functions
6.3.2 B-splines mixtures random generators
6.4 VQ parallelization scheme
6.4.1 A first parallel implementation
6.4.2 Towards a better parallelization scheme
6.4.3 A parallelization scheme with communication delays.
6.4.4 Comments
7 A cloud implementation of distributed asynchronous vector quantization algorithms
7.1 Introduction
7.2 The implementation of cloud DAVQ algorithm
7.2.1 Design of the algorithm
7.2.2 Design of the evaluation process
7.3 Scalability of our cloud DAVQ algorithm
7.3.1 Speedup with a 1-layer Reduce
7.3.2 Speedup with a 2-layer Reduce
7.3.3 Scalability
7.4 Competition with Batch K-Means
Conclusion
De l’utilisation du PaaS comme plateforme de calcul intensif
Constat technique
Constat économique
Implémentation d’algorithmes de clustering répartis
Cloud Batch K-Means
Cloud DAVQ
Perspectives
List of Figures
List of Tables
Bibliography