(Downloads - 0)
For more info about our services contact : help@bestpfe.com
Table of contents
Contents
List of Figures
Nomenclature
1 Introduction
1.1 Hybrid Data ow for Latency Tolerance
1.1.1 Convergence of data ow and von Neumann
1.1.2 Latency Tolerance
1.1.3 TSTAR Multithreaded Data ow Architecture
1.2 Task Granularity
1.3 Motivation
1.4 Dissertation Outline
2 Problem Statement
2.1 Explicit token matching shifts the challenges in hardware design to compilation
2.2 The complex data structure should be handled in an ecient way
2.3 Related Work
2.3.1 Compiling imperative programs to data- ow threads
2.3.2 SSA as an intermediate representation for data- ow compilation
2.3.3 Decoupled software pipelining
2.3.4 EARTH thread partitioning
2.3.5 Formalization of the thread partitioning cost model .
3 Thread Partitioning I: Advances in PS-DSWP
3.1 Introduction
3.1.1 Decoupled software pipelining
3.1.2 Loop distribution
3.2 Observations
3.2.1 Replacing loops and barriers with a task pipeline
3.2.2 Extending loop distribution to PS-DSWP
3.2.3 Motivating example
3.3 Partitioning Algorithm
3.3.1 Denitions
3.3.2 The algorithm
3.4 Code Generation
3.4.1 Decoupling dependences across tasks belonging to dierent treegions
3.4.2 SSA representation
3.5 Summary
4 TSTAR Data ow Architecture
4.1 Data ow Execution Model
4.1.1 Introduction
4.1.2 Past Data-Flow Architectures
4.2 TSTAR Data ow Execution Model
4.2.1 TSTAR Multithreading Model
4.2.2 TSTAR Memory Model
4.2.3 TSTAR Synchronization
4.2.4 TSTAR Data ow Instruction Set
4.3 TSTAR Architecture
4.3.1 Thread Scheduling Unit
5 Thread Partitioning II: Transform Imperative C Program to Data ow Program
5.1 Revisit TSTAR Data ow Execution Model
5.2 Partitioning Algorithms
5.2.1 Loop Unswitching
5.2.2 Build Program Dependence Graph under SSA
5.2.3 Merging Strongly Connected Components
5.2.4 Typed Fusion
5.2.5 Data Flow Program Dependence Graph
5.3 Modular Code Generation
5.4 Implementation
5.5 Experimental Validation
5.6 Summary
6 Handling Complex Data Structures
6.1 Streaming Conversion of Memory Dependences (SCMD)
6.1.1 Motivating Example
6.1.2 Single Producer Single Consumer
6.1.3 Single Producer Multiple Consumers
6.1.4 Multiple Producers Single Consumer
6.1.5 Generated Code for Motivating Example
6.1.6 Discussion
6.2 Owner Writable Memory
6.2.1 OWM Protocol
6.2.2 OWM Extension to TSTAR
6.2.3 Expressiveness
6.2.4 Case Study: Matrix Multiplication
6.2.5 Conclusion and perspective about OWM
6.3 Summary
7 Simulation on Many Nodes
7.1 Introduction
7.2 Multiple Nodes Data ow Simulation
7.3 Resource Usage Optimization
7.3.1 Memory Usage Optimization
7.3.2 Throttling
7.4 Experimental Validation
7.4.1 Experimental Settings
7.4.2 Experimental Results
7.4.2.1 Gauss Seidel
7.4.2.2 Viola Jones
7.4.2.3 Sparse LU
7.5 Summary
8 Conclusions and Future Work
8.1 Contributions
8.2 Future Work
Personal Publications
References




