On-chip Networks & Manycore architectures

Get Complete Project Material File(s) Now! »

Task Dependency

Many real-time systems can be modeled as a set of independent tasks when each of them generates an infinite number of job sequences. However, to meet functional requirements, the system must sometimes be modeled as a set of dependent tasks, which are therefore correlated and must follow an execution order with precedence constraints. Figure 1.3a shows the transaction model, where tasks are to be executed in a pipeline, with each task waiting for the completion of the previous one. Figure 1.3b represents the Directed-Acyclic-Graph (DAG) model, which allows more expressive dependencies.
A DAG is composed of a set of vertices and edges. The vertices represent comput-ing functions called sub-tasks which are typically chunks of the application’s code. The edges model communications and precedence constraints between sub-tasks. An edge e(vi , vj ) represents a communication between two sub-tasks vi and vj ; for every k ≥ 0, the k-th instance of sub-task vj cannot start execution before the k − th instance of sub-task vi has completed and the corresponding message from vi has been received. We call this a precedence constraint between vi and vj . Consequently, a vertex can be a source vertex if it has at least an outgoing edge; and it can be a destination vertex is it has one or more incoming edges. A root vertex has only outgoing edged and a sink vertex has only incoming edges, respectively. Finally, a DAG is acyclic: there is no closed cycle in the graph. A DAG is assigned a period Ti and it can be periodic or sporadic; it is also characterized by a relative deadline Di , that is the time by which all instances of the sub-tasks of the DAG must have completed from the corresponding DAG’s activation.

Real-time systems scheduling analysis

Task scheduling defines how the sequence of jobs has to be orchestrated in order to access a resource which could be a calculation unit or a storage medium. We can classify real-time scheduling algorithms according to the following char-acteristics: Preemptive vs. Non-Preemptive.
• Preemptive algorithms can suspend a running task on a processor, by saving its context, and replace it with another higher priority task.
• In non-preemptive algorithms, once a task has started its execution, it executes until completion with no suspensions, even when higher priority tasks arrive.
• In static schedulers, all decisions are based on fixed parameters and as-signed to tasks before their activation. They are also denoted by compile-time scheduling decisions.
• In dynamic schedulers, decisions are based on dynamic parameters that may change during the system execution. They are also denoted by run-time decisions.
• A scheduling algorithm is used off-line if it is executed on the task set before tasks activation. The schedule generated from this operation is stored and later executed by a dispatcher.
• A scheduling algorithm is used on-line if the scheduling decisions are taken at run-time and the schedule may change anytime a new task enters the system.

Partitioned Scheduling

The partitioned scheduling doesn’t allow task migration. At this fact, it has the advantages of:
• There is no migration cost to consider or any interference related to the migration process, which renders the schedulability analysis easy.
• If a task exceeds its worst-case execution time budget, it impacts only the tasks on the same processor.
• The partitioned approaches use a separate run-queue per processor, rather than a single global queue. Therefore, the overheads of queue manipulating are less important onto large systems.
Determining an optimal task partitioning is known to be an NP-Complete prob-lem in the strong sense, where it is usually described as a bin-packing problem. Since each partitioning heuristic has its own strategy, several of them consist of two phases according to S. Baruah and Fisher, 2005:
• The pre-processing phase where the tasks are pre-sorted according to a task parameter, e.g., according to the shortest deadline in DM or the task utilization.
• The assignment phase, which consists of placing tasks into processors.
There are usual assignment strategies commonly used for task partitioning:
• First-Fit (FF): the task is always allocated to the first processor that fits its demand, while processors are increasingly sorted based on their ID.
• Best-Fit (BF): the task is allocated to the first processor while the set of processors is sorted in decreasing order with respect to their utilization.
• Worst-Fit (WF): similar to BF, but the processors are sorted in increasing order with respect to their utilization.
• Arbitrary-Fit (AF): each task is assigned to a processor in a random order.

Global Scheduling

Unlike the partitioned scheduling, this class of algorithms allows task migra-tion from one processor to another. Global scheduling presents the following advantages compared to partitioned scheduling:
• It experiences only fewer context switches/preemptions when it is de-ployed, as the scheduler preempts tasks when there are no idle processors Andersson and Jonsson, 2000.
• The processor availability (space capacity) when a task executes less than its worst-case execution time, which can be used by all other tasks, not just those on the same processor.
• It is typically more appropriate to open systems, as there is no need to run load balancing or task re-scheduling when the set of tasks changes.
Global-EDF and Global-DM are examples of the adaptation of uniprocessor schedulers to global scheduling. The priority assignment logic doesn’t change but it allows the m highest tasks/jobs to run in parallel and at the same time. Usually, when a task arrives and there is no idle processor, it preempts a lower priority task and executes on the freed processor.

Semi-partitioned Scheduling

This class aims at addressing the fragmentation of spare capacity in partitioned systems is to split a small number of tasks between processors Davis and Alan Burns, 2011. Andersson and Tovar, 2006 have proposed an approach to scheduling periodic task sets with implicit deadlines, based on partitioned scheduling, but splitting some tasks into two components that execute at different times on different processors. Besides that, Andersson, Bletsas, and Sanjoy Baruah, 2008 developed the idea of job splitting to cater for sporadic task sets with implicit deadlines. In this case, each processor p executes at most two split tasks, one executed by processor p − 1 and one executed by processor p + 1.

Table of contents :

Abstract
Acknowledgment
Introduction
I. Background, Context and Related work
1. Introduction to Real-Time Systems
1.1. Introduction
1.2. Task Model
1.2.1. Task Dependency
1.3. Real-time systems scheduling analysis
1.3.1. Scheduling algorithms classification
1.3.2. Scheduling characteristics
1.3.3. Scheduling analysis
1.4. Uniprocessor Scheduling
1.4.1. Rate Monotonic Scheduling
1.4.2. Deadline Monotonic Scheduling
1.4.3. Earliest Deadline First
1.5. Multiprocessor scheduling
1.5.1. Partitioned Scheduling
1.5.2. Global Scheduling
1.5.3. Semi-partitioned Scheduling
1.6. Introduction to parallel programming
1.6.1. Programming real-time systems
1.6.2. POSIX Thread
1.6.3. Fork-join model
1.6.4. Message-passing interface
1.7. Conclusion
2. On-chip Networks & Manycore architectures
2.1. Introduction
2.2. Network-on-Chip classes
2.2.1. Preliminaries
2.2.2. Circuit-Switched NoCs
2.2.3. Packet-Switching NoCs
2.3. Network-on-Chip elements
2.3.1. Topology
2.3.2. Routing Algorithms
2.3.3. Flow Control techniques
2.4. Network-on-Chip routers
2.4.1. Virtual Channels
2.4.2. Allocators and arbiters
2.5. Industrial Network-on-Chip
2.6. Simulation tools
2.7. Conclusion
3. Task Mapping: Related work
3.1. Introduction
3.2. Problem definition
3.3. NoC Resource allocation for GP-tasks
3.3.1. Dynamic Mapping
3.3.2. Static Mapping
3.4. Safety-critical real-time tasks allocation
3.5. Conclusion
II. Contributions
4. Comparative study: Priority Preemptive and Round-Robin Arbitration
4.1. Introduction
4.2. NoC switching and routing mechanisms
4.3. System model
4.3.1. Architecture model
4.3.2. Communication model
4.4. Real-time Communication simulator
4.4.1. Packages
4.4.2. NoC & Simulation Engines
4.5. Analysis
4.5.1. Fixed priority
4.5.2. Time division multiple access
4.6. Experiments
4.6.1. Conflicting communications generation
4.6.2. Simulation
4.7. Conclusion
5. DAG tasks allocation on NoC
5.1. Introduction
5.2. Related Work
5.3. System Model
5.3.1. Architecture Model
5.3.2. The DAG task model
5.4. Real-time allocation and schedulability
5.4.1. Task allocation
5.4.2. Communication latency
5.4.3. Deadlines and offsets assignment
5.4.4. Single core schedulability analysis
5.5. Results and discussions
5.6. Conclusion
6. Processor-Memory co-scheduling on NoC
6.1. Introduction
6.2. Related work
6.2.1. Real-time tasks allocation
6.2.2. Off-chip memory sharing
6.3. System Model
6.3.1. Network-on-Chip design
6.3.2. DRAM organization
6.3.3. AER DAG task model
6.4. MO-SA DAG Task Allocation
6.4.1. Exploring the neighbor solutions
6.5. AER tasks scheduling analysis and memory latency
6.5.1. Virtual sub-tasks and memory latency cost
6.5.2. On-chip communication latency analysis
6.5.3. Preemption-cost and AER tasks response time
6.5.4. Offsets and jitters assignment
6.6. Experimental Results
6.6.1. Task set generation
6.6.2. Bin-packing heuristics
6.6.3. Platform specifications & experiments protocol
6.6.4. Simulation results and discussions
6.7. Conclusion
Conclusion & Future Works
Personal publications
Bibliography