Allocation Schemes of Resources with Downgrading

Get Complete Project Material File(s) Now! »

Cloud Computing

Cloud Computing is a service model, whose principle is the on-demand oﬀer of access to shared computing resources. In the literature, the definition of this relatively recent concept is not yet pinned down. Several authors interpret the term in diﬀerent ways (see [YBDS08, FZRL08, VRMCL08, Gee09, ASZ⁺10, ZCB10, MG11]). Among these authors, Foster et al. [FZRL08] propose one of the more broad and inclusive definitions [Cloud Computing is] ‘A large-scale distributed computing paradigm that is driven by economies of scale, in which a pool of abstracted, virtualised, dynamically-scalable, managed computing power, stor-age, platforms, and services are delivered on demand to external customers over the Internet.
From this extract, we retain the main concept of ‘a shared large pool of digital resources delivered on demand’. From now on, we refer to this rather general definition whenever we use the term Cloud.
The idea of pooling resources together in order to mitigate idleness and benefit from economies of scale to reduce costs is not particularly new. Since the popularisation of computers and the access to Internet, many techniques have been explored to exploit the full potential of the processing power associ-ated with these new technologies. We witnessed for example the development of Grid Computing, Utility Computing, Service Computing which are service concepts relatively similar to Cloud Computing. See Foster et al. [FZRL08] for a detailed discussion of the taxonomy of these technologies and Voas and Zhang [VZ09] for a critical discussion about the existence of Cloud Computing as a new computational paradigm.
Although these techniques have been in vogue for some while, none of them was as influential as Cloud Computing. Only as Internet access and computing resources have become cheaper, more powerful and ubiquitously available, users and companies felt confident enough to adopt the new service model. Cloud Computing has particularly been boosted by the interests of major players in the Internet market such as Amazon, with the Amazon Web Services (AWS) [Ama], Microsoft, with Microsoft Azure [Mic], and Google, with Google Cloud Platform (GCP) [Goo], leading to a genuine trend over the last decade which is currently still gaining momentum. See for example Amazon’s portfolio of customer success stories which provides a glimpse of how much Cloud services are penetrating the Internet market ([Ama]).

Essential Characteristics

Despite the variety of definitions of Cloud Computing services (see for instance Armbrust et al. [ASZ⁺10] and Mell and Grance et al. [MG11]), there exists a set of criteria recurrently cited by the authors of the field which has to be fulfilled for a service to be considered a part of the Cloud Computing environment.
Elasticity is probably the most notorious feature of Cloud services. Re-sources can be allotted and released (almost) instantaneously. Users can ask for more resources when in need, as well as release superfluous resources. Elasticity can be understood as an enhancement on scalability, as it does not only include keeping up with the increasing loads of jobs but also takes into account the proper adjustment for decreases in the need for resources.
Another common aspect of Cloud services and a key factor for its com-mercial success is the billing system. The Cloud is based on Pay-as-you-go (PAYG) charging mechanisms, where customers chose a virtual machine based on its specifications and are billed as a function of these specifications and for the time they use the service. For example, Amazon’s Elastic Compute Cloud (EC2) allows users to configure their virtual machines (VM) in terms of memory RAM, CPU cores, storage space, connectivity bandwidth, Oper-ational System (OS), etc. However, PAYG is not necessarily a characteristic of all Cloud services today. Cloud operators are currently developing middle and long term contracts, intended to smooth resource consumption, lock cli-ents in and improve the dimensioning of their data centres. We also observe the emergence of serverless computing where Cloud containers hold only the necessary resources for the execution of a specific application, and services like Amazon Lambda, where customers do not have to reserve an entire VM any more but pay for the computing resources necessary for the execution of their tasks.
Availability is an essential condition for Cloud services because it is a pre-condition for people trusting in the new technology. This is why all players of the Cloud ecosystem are today committed to high availability, proposing elevated Service Level Agreements (SLA). See for instance the “SLA » section in [Ama], [Goo] and [Mic]. High availability is ensured through the oﬀer of a reliable pool of resources and redundancy mechanisms.
Clouds can mainly be divided into either public or private Clouds, even though there also exist some hybrid forms or community Clouds which host private and public services in a single structure. The infrastructure of pub-lic Clouds is intended to serve many customers from diﬀerent organisations. Customers should not be aﬀected by the use of resources by others sharing the same physical machine (PM). Privates Clouds are designed to serve exclusively a single organisation. Today, many companies opt for this deployment level because data security issues and legislation constraints render the adoption of public services prohibitive.
Cloud Computing services can be organised into the following categories: Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS). This classification is usually referred to as SIP model. Certain subcategories have been proposed, such as Business Process as a Ser-vice (BPaaS) as a part of SaaS or Desktop as a Service (DaaS) as a category of PaaS. However, these subcategories fail to oﬀer a real surplus over the original SIP model, especially in terms of mathematical modelling.
Infrastructure as a Service (IaaS) is a service model where providers of-fer on-demand computing resources to their customers. This is done through virtualisation, which is the emulation of a real physical system (with hard-ware specifications) in a virtualised manner. We say a virtual machine is instantiated in a server or data centre when the physical resources (Physical Machines (PM)) supporting the virtual machines are located there. Customers can create such virtual machines which use resources such as Random Access Memory (RAM) and Computing Processing Unit (CPU) cores. In this man-ner, multiple users can simultaneously allocate customised virtual machines and access them over the Internet or a local network. The provision of IaaS contributes to the reduction of equipment acquisition and maintenance costs for the users.
Platform as a Service (PaaS) allows the user to deploy onto the Cloud infrastructure applications created using programming languages, libraries, services, and tools supported by the service provider. The user does not manage or control the underlying Cloud infrastructure (the network, servers, operating systems, etc.) but he has control over the deployed applications and possibly configuration settings for the application-hosting environment.
In Service as a Service (SaaS), the user has the possibility to run the provider’s applications which are deployed onto a Cloud infrastructure. The user does not manage or control the underlying Cloud infrastructure, neither individual application capabilities, with the possible exception of limited user-specific application configuration settings.
In the hierarchy of the Cloud, IaaS is considered to be the core of Cloud services because both PaaS and SaaS are dependent on the infrastructure, without which they could not exist. Emphasising the importance of infrastruc-ture, Zhang et al. [ZCB10] classify Cloud service providers into infrastructure providers, whose core business is the IaaS, and service providers, who propose services in the Cloud but rent resources from infrastructure providers. IaaS is also the most interesting aspect of Cloud Computing for mathematical studies as it is relatively simple to model. The requests executed in this level of Cloud require ”raw » units of resources, such as CPU cores or GB of RAM, which need to be available lo host the incoming clients. In this context, the mathem-atical framework of Queueing Theory and also some tools from Operational Research are very promising for the analytical evaluation of the performance of Cloud systems.

Issues and challenges

Although Cloud Computing has been widely adopted by many major play-ers in the telecommunication industry, the research in this field is still at an early stage. As for many other scientific advances, the technological possibil-ities develop much faster than our understanding of their functioning. Many key challenges, including automatic resource provisioning, power management and security management, are only starting to receive attention from the re-search community, while new challenges keep emerging from new applications and methods. See Zhang et al. [ZCB10] for more details.
The work in this thesis focuses more particularly on the challenges related to the current emergence of new decentralised Cloud architectures. Most of the commercial Clouds are today still implemented in large data centres and operated in a centralised fashion. In this set-up, all (or most of) the requests, originated near or far from the large data centre, is executed in this unit. This design has the advantage of economies of scale and high manageability but it comes at the price of high energy expenses (usually associated with the cooling down of the data centres), elevated up-front investments in infrastructure and increased latency delays. Long transmissions across the network are costly and often associated with security risks and cross-boarder juridical issues. Furthermore, the centralised system constitutes a potential bottleneck to the further development of Cloud Services, as its delivery channels are likely to get congested due to the continuous increase in the volume of Internet traﬃc. Small-sized data centres that are better geographically distributed and work in a decentralised manner can be more advantageous. They do not only require a less powerful and less expensive cooling system but, being located closer to the user, they also constitute a solution to the high transmission costs, saturation of delivery channels and latency times present in a centralised system. This is particularly important for response time-critical services such as content delivery and interactive gaming. In 2009, Valancius et al. [VLM⁺09] propose for example the usage of small (nano) data centres for hosting of Video on Demand services. We therefore witness a general movement of data centres towards the edge of the network away from centralised structures. This distribution of Cloud Computing resources at the edge of the network is known as Fog Computing (see [WRSvdM11, BMZA12, RBG12]) and allows for example to handle the enormous amount of data generated by devices located all over the network (i.e. Internet of Things (IoT), see Atzori et al. [AIM10]).
With these developments towards a more distributed Cloud architecture, the problematic of how to handle local demands relying on much smaller ser-vice units is currently gaining in importance. It is now possible that local data centres face scarcity of some resources which results in the rejection of user demands despite the total amount of resources in the system being suﬃcient to satisfy all user demands if the resources would be pooled in a centralised data centre. In order to reduce the occurrence of request blocking, Fog Computing data centres will have to collaborate, for example by oﬄoad-ing user demands to another data centres that does not face saturation. See Sharif et al. [SCAFV16] for an extensive discussion about the perspectives for decentralised service models for Cloud technologies.
The development of new models is crucial for the study of the Cloud and future technologies building upon it. The computation associated with these models needs to be adapted to the increased volume and diversity of the Cloud Computing traﬃc. For instance, classical approaches for network optimisa-tion, such as global load balancing strategies might be too slow depending on the magnitude of the system and necessitate revision. Another example, bandwidth sharing policies have to be redesigned in order to ensure a fair (equitable) division of network resources in the current context of an increase and diversification of customer demands. This thesis develops such models, taking into account the stochasticity of the arrival of user demands and dur-ation of service times, which is absent in most of the research conducted so far.

READ Weighted Lasso k-means distortion and consistency

Resource Allocation in the Cloud

The focus of this thesis is the dynamic allocation of resources in the frame-work of large stochastic networks. This part presents a selection of the most relevant literature on this topic.
In the context of Infrastructure as a Service, the Cloud service provider faces the challenge of allocating resources eﬃciently by assigning incoming requests of virtual machines to physical machines located in the Cloud data centre.
This issue of resource allocation in Cloud systems has been addressed by many authors with diﬀerent perspectives, scopes and objectives.
Most of the literature focuses on Cloud Computing systems which rely on one type of resource exclusively (e.g. RAM memory, CPU cores, bandwidth, etc.). Queueing theory has been widely used for the study of such systems as it is particularly adapted to this context and provides a rich toolbox for sys-tem optimisation. One of the first works in this field is Yang et al. [YTDG09] who consider queues to evaluate metrics regarding the performance of Cloud services and Khazaei et al. [KMM12] study a simple Cloud system using M/G/m/m+r queues. These papers as well as many other papers consider single resources systems. In the setting of Cloud Computing environments pro-posing multiple resources, the allocation of resources is more complex. The tools of queueing theory are less adapted and provide fewer information on the system under scrutiny. In the literature on multiple resources, the issue of resource allocation has classically been addressed through bin-packing and knapsack problems in the framework of stochastic optimisation [RT89, Ros95, GI99]. From a utility (or reward) perspective, these methods aim to determ-ine which (possible) configuration would maximise some given utility function (over time). For example, consider a data centre which is equipped with C_i units of resources i ∈ I, and VM of type j ∈ J requiring A_i,j units of each resource. If a VM of type j is hosted in this system, the operator will be rewarded with a w_j bonus. If we denote x_j the number of VM of type j in the system, then the optimisation problem is simply defined by are therefore too slow to be used as a viable resource orchestration practice for large Cloud systems despite the development of many eﬃcient heuristic methods in recent years. In addition, such methods allow only to consider the static characteristics of the systems with very limited applications in dynam-ical contexts.
Given these shortcomings in the multi-resource context, the recent literat-ure focuses back on systems providing a single resource exclusively, or straight-forward application of queueing theory resulting in quicker calculations and providing information on system dynamics. However, contrarily to the literat-ure on queueing theory mentioned previously, this recent strand of literature considers more complex systems which are composed of several queues, mod-elling the rack of servers, servers farms or data centres. One approach in this framework of resource allocation of a single resource considering mul-tiple queues is intelligent dispatching. Instead of running calculation power intense optimisation programs, the focus lies on the “sending » of a VM to a PM. One popular technique consists of dispatching virtual machines to the most busy physical machines in order to be able to generate as many idle PM as possible which can then be turned oﬀ (for power saving considerations). For example, Stolyar and Zhong [SZ13] introduce the algorithm Greedy with sub-linear Safety Stocks (GSS) with the objective of minimising the number of active servers in a multi-resource data centre in a heuristic manner, and Xiao et al. [XSC13] present a “skewness” metric to control for unevenness in the multidimensional resource utilisation for VM dispatching, aiming to minimise the number of servers in use. On the contrary, load balancing tech-niques have the aim to deploy the full capacity of the system, thus tempting to assign VM homogeneously across the pool of physical machines. In tele-communications theory, a common reference is the policy known as join the shortest queue (JSQ) which is, however, only eﬀective in small-scale systems as the state of each physical machine must be known in order to dispatch VM accordingly. For instance, Maguluri et al. [MSY12] discuss the implementa-tion of JSQ in a multi-resource Cloud service data centre and show that this policy is not suitable for large scale systems. Other techniques aim to improve global performance using only local information (from a small sample of data centres). The power-of-choice (or supermarket model) has been introduced by Mitzenmacher [Mit96] and Vvedenskaya et al. [VDK96]. In this approach, the dispatching program randomly compares the size of the queue in a limited amount of diﬀerent physical machines (usually 2 or 3) and assigns the VM to the server with the shortest queue (the smallest workload), resulting in quick and eﬃcient dispatching. This technique inspired the recent framework of pull based policies such as Join-Idle-Queue (JIQ), proposed by Lu et al. [LXK⁺11] where a list of idle physical machines is kept to which virtual machines are then allocated. See Stolyar [Sto15] and Foss and Stolyar [FS17] for a similar policy, the PULL algorithm, which assigns servers to customers instead of the common customer to server assignment.
In this thesis I consider resource allocation for on-time services, such as for example Microsoft Azure [Mic] systems, which necessitate instantiation of virtual machines as soon as they are assigned to a physical machine. If a system does not dispose of a suﬃcient amount of resources (in one or several PM) to host an arriving VM, the request is rejected. Cloud service providers have an incentive to avoid high rejection rates which result in the loss of customers and eventually a decline in revenues. On-time systems have been studied by many authors. Recently, Xie et al. [XDLS15] and Mukhopadhyay et al. [MKMG15] use loss systems in the study of intra data centre VM allocation with power of choice mechanisms. In Xie et al., the customers require diﬀerent amounts of resources during their service. In Mukhopadhyay et al., heterogeneous types of servers are considered, diﬀerentiated by their capacity (size). However, these models mostly are adapted to cases where system performance is determined exclusively by one resource (i.e. the system’s bottleneck). In this thesis, similar loss systems are considered, but introducing in addition a generalisation to the multi-resource case (see Chapter 3 concerning cooperation schemes between multi-resource processing facilities).
In the literature, the allocation of resources has been focused on the intra data centre schemes, i.e. allocation of resources within a given data centre. However, as mentioned before the Cloud is evolving, there exists a need for collaboration in between small-size data centres in order to reduce the rejection of user demands in locally saturated data centres in the new Cloud architec-ture.
In Chapter 2 of this thesis, I investigate an allocation scheme for on demand video services where clients are served with the lowest service grade (level or resolution in terms of video quality) as soon as the link occupation is above a certain threshold, allowing the system to mitigate rejection of customers. Chapter 3, 4 and 5 focus on policies to enable the cooperation between de-centralised facilities in a eﬀort to improve the performance (particularly redu-cing the blocking rates of customers) which can find an application in the new decentralised Cloud architectures which are currently emerging. In Chapter 3, I study an oﬄoading scheme between two data centres in the multi-resource context. In the framework of resource-specialised virtual machines requiring diﬀerent proportions of each resource, this policy aims to alleviate the local charge of a resource by forwarding the customers (VM) which require the most of the resource which is depleted locally. In Chapter 4, I consider the policy which forwards jobs from a data centre to another with some probability if the request cannot be served locally. And, in Chapter 5, a system similar to the one presented in Chapter 4 is considered under a another oﬄoading policy: jobs are systematically forwarded from one data centre to the other, and are only accepted in the second data centre if there are a minimum amount of free servers. Notice that in this case, only one processing facility (data centre 1) is oﬄoading customers to the other (data centre 2), which protects its original requests using a trunk reservation mechanism.

Stochastic Modelling of Services

This section introduces simple versions of the models used throughout this thesis for the analysis of Cloud Computing systems. The models are presented in the order of their complexity to acquaint the reader step by step to the classical tools of stochastic modelling.

Queueing Systems

A queueing system is the representation of a real system, where clients (or jobs) arrive, demand a service which takes some time, and then leave the system. A queue may be equipped with one or more servers and a buﬀer (or waiting) zone, such that clients who are not being served can wait for the starting of the service. If a queue has no buﬀer zone or if all of its servers and buﬀer are already used, all arriving customers are rejected until some space is freed. The duration of the time that clients are waiting is called waiting time. The duration of the time that the service is being executed is called service time.
The study of such queueing systems originated from the development of telecommunication technologies and gained in importance with the arrival of centralised computing units (main frame architecture) some years later. The main objective was to be able to calculate blocking probabilities of clients in the telephone network and waiting times of customers until access to the computing power in the main frame.

Erlang’s Problem

In the beginning of the twentieth century, while working for the Copenha-gen Telephone Company (CTC), the Danish engineer Agner K. Erlang (1878– 1929) was confronted with the intriguing problem of dimensioning the com-pany’s telephone network.
Back then, a phone call was the realisation of a connection between a caller and receiver, using a circuit board on the links between these two interlocutors.
Local communities were connected by one board of circuits, see New-man [New10]. Erlang was responsible for determining the number of circuits to ensure a certain service level (or grade), given by the probability a client is rejected by the exhaustion of the circuits. In his eﬀorts to engineer this sys-tem, Erlang published many papers with two being of particular importance for the development of this thesis.
His 1909 seminal paper “The Theory of Probabilities and Telephone Con-versations” [Erl09] ¹ showed that the number of calls coming in follows a Pois-son distribution. This approximation allows to calculate system performance and is used until today in many areas due to its practicality.
In 1917, Erlang published “Solution of some problems in the theory of probabilities of significance in automatic telephone exchanges”. Analysing teletraﬃc data, Erlang observed that the duration of the phone calls were exponentially distributed with mean µ⁻¹. Using this, the fact that incoming calls follow a Poisson distribution at rate λ > 0 and analysing the evolution of the number of used circuits, Erlang derived his famous formula for traﬃc design which we call today the Erlang-C formula.

Table of contents :

1 Introduction
1.1 Cloud Computing
1.2 Stochastic Modelling of Services
1.3 Mathematical Framework
1.4 Presentation of the following chapters
2 Allocation Schemes of Resources with Downgrading
2.1 Introduction
2.2 Model description
2.3 Scaling Results
2.4 Invariant Distribution
2.5 Applications
3 Cooperative Schemes in the framework of multi-resource Cloud Computing
3.1 Introduction
3.2 Model description and notation
3.3 Scaling Results
3.4 Time Evolution
3.5 Conclusion
4 Analysis of an offloading scheme for data centres in the framework of Fog Computing
4.1 Introduction
4.2 Model description
4.3 Characteristics of the limiting random walk
4.4 Boundary value problems
4.5 Numerical results: Offloading small data centres
4.6 Conclusion
5 Analysis of a trunk reservation policy in the framework of fog computing
5.1 Introduction
5.2 Model description
5.3 Analysis of the limiting random walk
5.4 Boundary value problems
5.5 Numerical experiments
5.6 Conclusion