Scheduling mixed-criticality data-dependent tasks on multi-core architectures

Get Complete Project Material File(s) Now! »

The increasing complexity of safety-critical systems

Nowadays, safety-critical systems are composed of many software and hardware components necessary to the correct execution of these systems into their physical environments. If we consider an airborne system for example, its services can be decomposed into the following categories: cockpit, flight control, cabin, fuel and propellers. Each one of this categories is decomposed into various software functionalities as well: camera systems, audio control, displays, monitoring, data recording, are some examples of functionalities included in the cockpit. The design of safety-critical system is therefore very complex. Safety standards have defined criticality or assurance levels for software and hardware components that are deployed into safety-critical systems. The consequence of missing a deadline vary based on the software component criticality level. Standards such as DO- 254, DO-178B and DO-178C (used for airborne systems), define five assurance levels: each one of these levels is characterized by a failure rate (level A 10−9 errors/h, level B 10−7, and so on). A deadline miss on a level A task might lead to catastrophic consequences, while a deadline miss on a level E task has no impact on the system’s safety.
Traditionally, services of different criticalities have been separated at a hardware level: in federated avionics for example, Line Replacement Units (LRUs) are used within the airborne system to deliver one specific functionality. By having such a space isolation, failure propagation is avoided. Modularity is also a key property achieved thanks to LRUs: a failing LRU can be replaced when needed. Nevertheless, during the past few years, the safety-critical industry is putting efforts towards reducing size, weight and power consumption. This trend essentially advocates for the integration of multiple functionalities on a common computing platform. Some examples of this trend are the Integrated Modular Avionics (IMA) [9; 10] and the AUTomotive Open System ARchitecture (AUTOSAR) [11] initiatives that aim at the integration of functionalities onto common hardware, while maintaining compliance with safety standards. The main focus of these initiatives is to maintain benefits of the isolation offered by separated processing units but at a software level. In other words, a software component like a hypervisor [12] or a real-time operating system [13], will be responsible for executing and isolating the different functionalities required by the safety-critical system. This integration has gained additional interest due to the constant innovation in integrated chips as well, in particular thanks to multi-core processors.

Adoption of multi-core architectures

Multi-core architectures were proposed to solve problems related to power consumption, heat dissipation, design costs and hardware bugs that mono-core processors were facing due to the constant miniaturization of transistors. Computational power is improved by exploiting parallelism instead of increasing clock frequency of processors. This prevents power consumption to grow and limits heat dissipation. Architectural design errors are also limited since in most multi-core architectures, all cores are identical. Since manufacturers have decided to adopt multi-core architectures as the dominant architecture platform for embedded devices (for cost reasons, mainly due to scale factors in mobile phone industry), the adoption of multi-core architectures is a necessity for the safety-critical domain. Nevertheless, while multi-core architectures improve average performances, shared hardware resources like caches and memory buses makes these architectures hardly predictable. To ensure temporal correctness, this limitation has been solved by overestimating the worst behavior of applications deployed in the multi-core system. Real-time systems allocate processing resources to guarantee the execution of tasks even in their worst case.
In conclusion, to cope with current industrial needs, safety-critical systems need to: (i) execute efficiently in multi-core architectures. Efficiency in this case is related to number of cores necessary to schedule a system, most embedded architectures are limited by the number of processors that can be integrated. (ii) Make good use of processing resources by delivering as many functionalities as possible, even if these functionalities different criticalities. These two requirements need to respect temporal and logical correctness which are necessary in safety-critical systems. To do so, real-time scheduling and models of computation that assist system designers need to be adapted to these current trends.

Temporal correctness for safety-critical systems

Like we mentioned in Chapter 1, the correctness of a safety-critical system not only depends on the logical correctness of its outputs but also on timeliness. To satisfy deadlines imposed by their environments, safety-critical systems use models and theoretical results from real-time scheduling analyses and techniques. Nonetheless, resource allocation used by real-time schedulers is based on worst-case analysis which is very difficult to estimate, more so when multi-core architectures are considered.

READ OVERVIEW OF THE EUROPEAN PENSION FUND INDUSTRY

Real-time scheduling on multi-core architectures

To support multi-core architectures, existing real-time scheduling algorithms developed for mono-core processors were adapted. This adaptation followed two main principles: the partitioned or the global approach. A survey presenting different methods to schedule real-time tasks on multi-core processors is presented in [7]. The partitioned approach consists in dividing the task set of the system into various partitions, and schedule them into a single core by applying a mono-core scheduling algorithm in this partition. Therefore, existing scheduling algorithms for mono-core processors can be used as-is in the partition created. How partitions are formed and distributed among cores is the main problem this approach needs to solve. It is widely known that such partitioning is equivalent to the bin-packing problem, and is therefore highly intractable:
NP-hard in the strong sense [20]. Optimal implementations are impossible to be designed. Approximation algorithms are therefore used to perform such partitioning. For example, Partitioned-EDF (P-EDF) using First-fit Decreasing, Best-fit Decreasing andWorst-fit Decreasing heuristics have shown to have a speedup factor no larger than 4/3 [20].

Table of contents :

1 Introduction
1.1 General context and motivation overview
1.2 Contributions
1.3 Thesis outline
2 Industrial needs and related works
2.1 Industrial context and motivation
2.1.1 The increasing complexity of safety-critical systems
2.1.2 Adoption of multi-core architectures
2.2 Temporal correctness for safety-critical systems
2.2.1 Real-time scheduling
2.2.2 Off-line vs. on-line scheduling for real-time systems
2.2.3 Real-time scheduling on multi-core architectures
2.2.4 The Worst-Case Execution Time estimation
2.3 The Mixed-Criticality scheduling model
2.3.1 Mixed-Criticality mono-core scheduling
2.3.2 Mixed-Criticality multi-core scheduling
2.3.3 Degradation model
2.4 Logical correctness for safety-critical systems
2.4.1 The data-flow model of computation
2.4.2 Data-flow and real-time scheduling
2.5 Safety-critical systems’ dependability
2.5.1 Threats to dependability
2.5.2 Means to improve dependability
2.6 Conclusion
3 Problem statement
3.1 Scheduling mixed-criticality data-dependent tasks on multi-core architectures
3.1.1 Data-dependent scheduling on multi-core architectures
3.1.2 Adoption of mixed-criticality aspects: modes of execution and different timing budgets
3.2 Availability computation for safety-critical systems
3.3 Availability enhancements – Delivering an acceptable Quality of Service .
3.4 Hypotheses regarding the execution model
3.5 Conclusion
4 Contribution overview
4.1 Consolidation of the research context: Mixed-Criticality Directed Acyclic Graph (MC-DAG)
4.2 Scheduling approaches for MC-DAGs
4.3 Availability analysis and improvements for Mixed -Criticality systems
4.4 Implementation of our contributions and evaluation suite: the MC-DAG Framework
4.5 Conclusion
5 Scheduling MC-DAGs on Multi-core Architectures
5.1 Meta-heuristic to schedule MC-DAGs
5.1.1 Mixed-Criticality correctness for MC-DAGs
5.1.2 MH-MCDAG, a meta-heuristic to schedule MC-DAGs
5.2 Scheduling HI-criticality tasks
5.2.1 Limits of existing approaches: as soon as possible scheduling for HI-criticality tasks
5.2.2 Relaxing HI tasks execution: As Late As Possible scheduling in HI-criticality mode
5.2.3 Considering low-to-high criticality communications
5.3 Global implementations to schedule multiple MC-DAGs
5.3.1 Global as late as possible Least-Laxity First – G-ALAP-LLF
5.3.2 Global as late as possible Earliest Deadline First – G-ALAP-EDF
5.4 Generalized N-level scheduling
5.4.1 Generalized MC-correctness
5.4.2 Generalized meta-heuristic: N-MH-MCDAG
5.4.3 Generalized implementations of N-MH-MCDAG
5.5 Conclusion
6 Availability on data-dependent Mixed-Criticality systems
6.1 Problem overview
6.2 Availability analysis for data-dependent MC systems
6.2.1 Probabilistic fault model
6.2.2 Mode recovery mechanism
6.2.3 Evaluation of the availability of LO-criticality outputs
6.3 Enhancements in availability and simulation analysis
6.3.1 Limits of the discard MC approach
6.3.2 Fault propagation model
6.3.3 Fault tolerance in embedded systems
6.3.4 Translation rules to PRISM automaton
6.4 Conclusion
7 Evaluation suite: the MC-DAG framework
7.1 Motivation and features overview
7.2 Unbiased MC-DAG generation
7.3 Benchmark suite
7.4 Conclusion
8 Experimental validation
8.1 Experimental setup
8.2 Acceptance rates on dual-criticality systems
8.2.1 Single MC-DAG scheduling
8.2.2 Multiple MC-DAG scheduling
8.3 A study on generalized MC-DAG systems
8.3.1 Generalized single MC-DAG scheduling
8.3.2 The parameter saturation problem
8.4 Conclusion
9 Conclusion and Research Perspectives
9.1 Conclusions
9.2 Open problems and research perspectives
9.2.1 Generalization of the data-driven MC execution model
9.2.2 Availability vs. schedulability: a dimensioning problem
9.2.3 MC-DAG scheduling notions in other domains
List of Publications
Bibliography