A solution to the MAD problem for grids

Get Complete Project Material File(s) Now! »

Pull Mechanism and Late Binding Patterns

Master-Slave was later referred to as Push Mechanism by contrast with Pull Mechanism where idle resources request tasks [SBP03]. Pull Mechanism makes allocation resilient to broken resources and removes queues in front of end resources.
The Late Binding pattern allows the introduction of Pull Mechanism on a system designed with Push Mechanism. In Late Binding, monitors pushed to an end resource check the resource status before pulling actual tasks [BHL+06]. The pattern was rst implemented by Condor Glide-In in 2001 [TTL02]. Since 2003, major grids are moving towards Late Binding [GDJ08].
AliEn is an example of Late Binding. AliEn is the Analysis Environment for ALICE, a virtual organization (VO). AliEn uses its own scheduling system on LCG infrastructure [SBP03]. As a consequence from using LCG, members of ALICE rely on the sites’ best eort. However, tasks are processed immediately when submitted because a task is pulled when relevant resources are avaiable. AliEn gives more exibility to the ALICE collaboration for task mapping. A job agent monitors every resource and triggers task selections from ALICE job queue. This mechanism is designed both to cope with lack of guarantees in LCG SLAs, and to prioritize tasks.
AliEn initiated the emergence of a proper front mapping via Late Binding. Most major VOs are now implementing a similar system: CDF with GlideCAF, ATLAS with Cronus and Panda, LHCb with DIRAC, CMS with Glidein-WMS and other independent large-scale applications may use DIANE [GDJ08]. Symmetric mapping decomposes the allocation in two parts and Pull Mechanism is a possible design for both parts. Late Binding separates resource subscription from task allocation. A system that implements Late Binding implements Symmetric Mapping if the underlying Metascheduling implementation does not constrain or obstruct the provider.

Market Based Control and Peer-to-Peer Matching Patterns

Market-Based Control simulates markets to share resources eciently among competing users [Cle96, LB06]. For example Tycoon balances the load across and inside servers according to user payment [Lai05]. A contract involves the provider with the least expensive oer. The price of a provider’s resources depends on the load. The provider does not intervene in the allocation. Commercial services are analogous. Amazon EC2 8 is a web-service to sell Amazon’s resources on demand. The only provider is Amazon. In both examples, users have direct access to virtual machines. By contrast with Market-Based Control, Symmetric Mapping also supports the cases where the contract between user and provider does not entirely determine the allocation. Peer-to-peer Matching is another alternative for contract mapping. In the absence of a market, participants match and negotiate contacts in a collaborative manner. For example, a Condor ock is the assembly of computer centers that borrow hardware resource from one another when needed [ELvD+96]. A component called the Gateway is hosted by each participant. If a user cannot assign tasks to its own resources the Gateway examines its pairs. An important part of Condor is designed specically for matchmaking and negotiation. The Matchmaking mechanism is provided through the use of ClassAds [Ram00]. Users and providers dene themselves with relevant attribute and write their requirements with regular expressions on the other participant’s attributes.

MAD resource allocation problem

This section formalizes the hypotheses and the objective of grid resource allocation. The formulation is an analogy with the MAD12 model for fault tolerance in distributed systems. We propose reasonable hypotheses on the behavior of autonomous participants in resource allocation. The objective is to independently optimize the value as perceived by every participant.

Table of contents :

Part 1. Software architecture
1 Task Mapping Survey
1.1 Introduction
1.1.1 Scope
1.1.2 Outline
1.2 Federating resources
1.2.1 Infrastructures
1.2.2 Objectives
1.2.3 Applicable workloads
1.2.4 Job: the element of an application
1.2.5 On single administrative domains
1.2.6 Allocation concerns
1.3 Disruptions to resource allocation
1.3.1 Delegation
1.3.2 User and job disconnected
1.3.3 Out-of-date matching
1.3.4 Intersecting the capabilities of local systems
1.3.5 Partial conclusion
1.4 User-driven allocation
1.4.1 User concerns
1.4.2 Late binding
1.4.3 Allocation by applications
1.4.4 Allocation by collaborations
1.4.5 Sudden success of an old Condor mechanism.
1.4.6 An evolution of VO strategies
1.4.7 Specic constraints
1.5 Conclusion
2 MAD Resource Allocation
2.1 Introduction
2.2 Previous work
2.2.1 Queuing models
2.2.2 Economic models
2.2.3 Containment in other models
2.2.4 Metascheduling Pattern
2.2.5 Pull Mechanism and Late Binding Patterns
2.2.6 Market Based Control and Peer-to-Peer Matching Patterns .
2.3 Objectives
2.3.1 Minimum makespan
2.3.2 Minimum sum of weighted ows
2.3.3 Minimum energy consumption
2.3.4 Minimum obstruction
2.4 The Symmetric Mapping pattern
2.4.1 Overview
2.4.2 Denition
2.4.3 Practical perspective
2.4.4 Relevance of Symmetric Mapping
2.5 Accuracy and benets
2.5.1 Resources and tasks
2.5.2 Algorithms
2.5.3 Results
2.6 The model
2.6.1 Allocations
2.6.2 Schedules
2.6.3 Specications
2.6.4 Value
2.7 MAD resource allocation problem
2.7.1 Hypotheses
2.7.2 Objective
2.8 A solution to the MAD problem for grids
2.8.1 Containers
2.8.2 Protocol
2.8.3 Correctness
2.9 Conclusion
3 Deploying Virtual Machines
3.1 Introduction
3.2 Related systems
3.3 Choices
3.4 Managing virtual machines
3.5 Placement and conguration
3.6 Lifecycle management
3.7 Description lookup
3.8 Development and tests
3.9 Conclusion
4 Deploying Permanent Services
4.1 Introduction
4.2 Analogies
4.3 Components logic
4.4 Components placement
4.5 Conclusion
Part 2. Performance prediction
5 Fast and Light Prediction
5.1 Introduction
5.2 Related work
5.3 Scope of the contribution
5.4 A characterization
5.5 Evaluation
5.5.1 Stack distance distribution t
5.5.2 Analysis speed and prediction accuracy
5.5.3 Prediction robustness
5.6 Conclusion
6 Cache Thrashing
6.1 Introduction
6.2 Position of this work
6.2.1 Stack distance
6.2.2 Previous work
6.2.3 Assumptions and contributions
6.3 Metrics
6.3.1 Age
6.3.2 Rank
6.3.3 Span
6.3.4 Propagation
6.4 Cache misses
6.4.1 A typology of cache misses
6.4.2 Blind accesses count
6.4.3 Blind hits count
6.5 Stochastic analysis
6.5.1 Active span
6.5.2 Time to ll
6.6 Algorithm and complexity
6.7 Experimental validation
6.8 Conclusion
Conclusion