Execution model for MATLAB – Project topics materials

Get Complete Project Material File(s) Now! »

Hardware performance counters

Different approaches to performance analysis are based on pure measurements using for example hardware performance counters. These counters are special purpose registers on processors which collect information about program execution in form of performance events, e.g. amount of cache misses, processor stalls [91]. Although, at first, the set of performance counters was ever changing and unstable, recently the available performance events stabilised with the introduction of architectural events which are available in the whole line of the microarchitectures. Accordingly to Stéphane Eranian [92], the Intel’s attitude towards performance counters has changed with the Intel
Itanium processors. The second change can be observed with the introduction of Top-Down Microarchitecture Analysis (TMA) by Yasin [33]. TMA allows to find bottlenecks during program execution and to pinpoint them to a specific part of processor pipeline like the frontend, back-end, or execution units. Nowadays, performance counters are used in performance and power analyses, adaptive optimisations and many others.

Performance of execution environments

Compilers and compiled programs are not the only target of performance analysis. For many years, researchers have been working on the performance analysis of interpreters, virtual machines, and Just-In-Time compilers (some of these examples, we have already mentioned in previous sections).
Branches and jumps. Since their creation, interpreters were consider being slow because they create an additional layer of abstraction between the program and the hardware. Moreover, by definition, they interpret instructions one by one, thus making impossible optimisations which work on two or more instructions. At the time, slow branches and jumps were often considered as the root cause of performance problems [6, 113–115]. The notion of branch misprediction as the main problem was widespread for many years. Until the study by Rohou et al. [116], where researchers analysed again this concept and compared current and new techniques for branch prediction. The results showed that the new microarchitectures have improved to a level, that the branch misprediction was no longer a problem.

Case study: cost of array slicing

In this section, we present one application of performance event profiles (PEP) to the cost analysis of data copy performed during array slicing. In MATLAB, each array slice requires a data copy as depicted in Figure 3.2 with the vectorised version of the code (vec). However, as with any costly operation, several questions arise such as: (1) how many cycles exactly takes data copy? (2) does the cycle cost change with the volume of copied data? An answer to question (1) gives information useful for taking a decision whether or not to vectorise a loop. If the vectorised loop requires a lot of
explicit array slicing, then the benefit from using vector operations might be overshadowed by the cost of making data copies. Moreover, the question (2) asks if there is a fundamental difference in how MATLAB performs the data copy according to an increasing size of data. Differences in cost according to the size of data could indicate the use of various copying mechanisms by MATLAB (e.g. software prefetching, use of packed vector instructions) or that the machine is performing the copy differently (e.g by using a hardware prefetching).
We start by selecting benchmark codes for the analysis and performance events for building performance event profiles. Next, we repeatedly execute each benchmark with variable size of data and build performance event profiles for each execution. From the profiles, we measure the length of execution regions which perform data copy. Finally, we collect the information about duration of data copy and compute the cost of per-element copy.

READ THE CONTRIBUTION OF LIVESTOCK AND ANTHROGENIC FACTORS TO HEAVY METAL LOAD

Table of contents :

1 Introduction
1.1 Motivation
1.2 Research challenges
1.3 Thesis contributions
1.4 Thesis structure
2 Related work
2.1 Acceleration of MATLAB programs
2.1.1 Compilation of MATLAB programs
2.1.2 Transformation of MATLAB code
2.1.3 Alternative execution environments
2.1.4 Analysis of MATLAB programs
2.2 Performance analysis
2.2.1 Metrics and models
2.2.2 Hardware performance counters
2.2.3 Profiling
2.2.4 Performance of execution environments
2.3 Conclusion
3 Performance event profiles
3.1 Overview
3.2 Motivation
3.3 Building performance profiles
3.3.1 Selecting the sampling event
3.3.2 Selecting the sampling threshold
3.3.3 Performance profiles with mPAPI
3.4 Finding execution regions
3.5 Case study: cost of array slicing
3.6 Conclusion
4 Execution model for MATLAB
4.1 Scope of the execution model
4.2 Instruction blocks in JIT compilation
4.3 Detecting instruction blocks
4.4 JIT compilation of functions
4.4.1 Built-in functions
4.4.2 User-defined function
4.5 Instruction tree
4.5.1 Building minimal instruction tree
4.5.2 Predicting execution from minimal instruction tree
4.6 Conclusion
5 Code transformations for array operations
5.1 Redesigning array slicing
5.1.1 Dynamic array slicing
5.1.2 Eliminating redundant 0-initialisation
5.2 Repacking of array slices
5.3 Range simplification
5.4 Profile-guided loop vectorisation
5.5 Conclusion
6 HU!M compiler
6.1 Overview
6.2 Influences
6.3 Code analysis
6.4 Code transformation
6.4.1 Loop vectorisation
6.4.2 Fast array slicing substitution
6.4.3 Repacking of array slices
6.5 Conclusion
7 Evaluation of the execution model and code transformation
7.1 Evaluation of the execution model
7.1.1 Model precision
7.1.2 Splitting expressions
7.1.3 Reordering operations
7.1.4 Information limit of performance profiles
7.2 Evaluation of the range simplification
7.3 Evaluation of the repacking of arrays
7.4 Conclusion
8 Conclusion
8.1 Summary
8.2 Future work
A Experiment methodology
A.1 Preparation of the environment
A.2 Collecting measurements
A.3 Machine specification
B mPAPI
B.1 mPAPI interface
B.1.1 Enumerating available performance events
B.1.2 Measuring performance events in counting mode
B.1.3 Measuring performance events in sampling mode
C Menchi
C.1 Benchmark preparation
C.2 Experiment specification
C.3 Experiment modes
D Accompanying materials