Optimising Just-in-time compiler architectures

Get Complete Project Material File(s) Now! »

Function-based architecture

The first optimising JIT architecture invented [Hölzle 1994a] was designed to gen-erate optimised n-functions. From a given v-function, the optimising JIT performs a set of optimisations which includes inlining of other v-functions, and generates an optimised n-function. The section gives firstly an overview of the architecture and then discuss concrete implementations with references in Section

Architecture overview

In many VMs following this architecture, three tiers are present. The following three paragraphs detail each tier, including how virtual calls are executed in each case. Tier 1: V-function interpreter. The first tier is a virtual function interpreter. In most VMs, no compilation time is required at all to interpret a v-function3 but the execution of the v-function by the interpreter is not very fast. Virtual calls are usually implemented with some sort of look-up cache to avoid computing the function to activate at each call. The interpreter tier does not necessarily collect runtime information.
Tier 2: Baseline JIT. The second tier is the baseline JIT, which generates from a single v-function a n-function with a very limited number of optimisations. Once compiled, the n-function is used to execute the function instead of interpreting the v-function. A small amount of time is spent to generate the n-function but the exe-cution of the n-function is faster than the v-function interpretation. The n-function generated by the baseline JIT is introspected to collect runtime information if the function is executed enough times to be optimised by the next tier. The goal of the baseline JIT is therefore to generate n-functions providing reliable runtime in-formation with limited performance overhead and not to generate the most effi-cent n-functions. Virtual calls are usually generated in machine code using inline caches [Deutsch 1984, Hölzle 1991]: each virtual call has a local cache with the functions it has activated, both speeding-up the execution and collecting runtime information for the next tier.
Tier 3: Optimising JIT. The last tier is the optimising JIT, which generates an optimised n-function. The optimising JIT uses runtime information such as the inline cache data to speculate on what function is called at each virtual call, allow-ing to perform inlining and to generate the optimised n-function from multiple v-functions. Such optimisations greatly speed-up the execution but are invalid if one of the compile-time speculation is not valid at runtime. In this case, the VM deop-timises the code and re-optimises it differently [Hölzle 1994b, Hölzle 1992]. The optimising JIT requires more time than the baseline JIT to generate n-functions, but the generated code is much faster. The execution of virtual calls is not really relevant in this tier as most virtual calls are removed through inlining and most of the remaining ones are transformed to direct calls.

Existing virtual machines

The first VM featuring this function-based architecture was the Self VM [Höl-zle 1994a]. The Self VM had only two tiers, the baseline JIT and the optimising JIT.
The second VM built with this design was the Animorphic VM for the Strongtalk programming language [Sun Microsystems 2006], a Smalltalk dialect. This VM is the first to feature three tiers. The first tier is a threaded code inter-preter hence interpretation requires a small amount of compilation time to generate threaded code from the v-function. The two other tiers are the same as in the Self VM. The animorphic VM has never reached production. The Hotspot VM [Paleczny 2001] was implemented from the Self and animor-phic VM code base and has been the default Java VM provided by Sun then Oracle for more than a decade. In the first versions of the Hotspot VM, two executables were distributed. One was called the client VM, which included only the base-line JIT and was distributed for applications where start-up performance matters. The other one was called the server VM, which included both JIT tiers, and was distributed for application where peak performance matters. Later, the optimising JIT was introduced in the client VM with different optimisation policies than the server version to improve the client VM performance without decreasing too much start-up performance. In Java 6 and onwards, the server VM became the default VM as new strategies allowed the optimising JIT to improve performance with lit-tle impact on start-up performance. Lastly, a single binary is now distributed for the 64 bits release, including only the server VM.
More recently, multiple Javascript VMs were built with a similar design. A good example is the V8 Javascript engine [Google 2008], used to execute Javascript in Google Chrome and Node JS. Other VMs, less popular than the Java and Javascript VMs are also using similar architectures, such as the Dart VM.
One research project, the Graal compiler [Oracle 2013, Duboscq 2013], is a function-based optimising JIT for Java that can be used, among multiple use-cases, as an alternative optimising JIT in the Hotspot VM.

READ Governance Engineering

Just-in-time compiler tiers

Many VMs featuring a function-based architecture in production nowadays have three tiers. The number of tiers may however vary from two to as many as the development team feels like. The following paragraphs discuss the reasons why the VM implementors may choose to implement a VM with two tiers, three tiers or more. Engineering cost. Each new tier needs to be maintained and evolved accordingly to the other tiers. Hence, a VM having more tiers requires more engineering time for maintainance and evolutions. Any bug can come from any tier and bugs com-ing from only a single tier can be difficult to track down. Evolutions need to be implemented on each tier. To lower the VM maintenance and evolution cost, a VM needs to have the least number of tiers possible. Minimum number of tiers. By design, the optimising JIT is the key component for high-performance and it needs runtime information from previous runs to gen-erate optimised code. Hence, a VM with a function-based architecture requires at least two tiers. One tier, the non-optimising tier, is used for the first runs to collect statistical information and is typically implemented as an interpreter tier or a base-line JIT tier. The second tier, the optimising tier, generates optimised n-functions and is implemented as an optimising JIT. To perform well, the optimising tier has to kick in only if the function is used frequently (else the compilation time would not be worth the execution time saved) and the previous tier(s) must have executed the v-function enough time to have collected reliable runtime information. For this reason, the optimising tier usually kicks in after several thousands executions of the v-function by the previous tier(s).
Two non-optimising tiers. Many VMs feature two non-optimising tiers and one optimising tier. The non-optimising tiers are composed of an interpreter tier and a baseline JIT tier. These two tiers have different pros and cons and featuring both allows the VM to have the best of both worlds. There are three main differences between the two tiers: execution speed, efficiency of runtime information collec-tion and memory footprint. The three differences are detailed in the next three paragraphs.

Table of contents :

1 Introduction
1.1 Context
1.2 Problem
1.3 Contributions
1.4 Outline
1.5 Thesis and published papers
2 Optimising Just-in-time compiler architectures
2.1 Terminology
2.2 Function-based architecture
2.3 Tracing architecture
2.4 Metacircular optimising Just-in-time compiler
2.5 Runtime state persistence
3 Existing Pharo Runtime
3.1 Virtual machine
3.2 Language-VM interface
3.3 Language relevant features
4 Sista Architecture
4.1 Overview
4.2 Function optimisation
4.3 Function deoptimisation
4.4 Related work
5 Runtime evolutions
5.1 Required language evolutions
5.2 Optional language evolutions
5.3 Work distribution
6 Metacircular optimising JIT
6.1 Scorch optimiser
6.2 Scorch deoptimiser
6.3 Related work
7 Runtime state persistence across start-ups
7.1 Warm-up time problem
7.2 Snapshots and persistence
7.3 Related work
8 Validation
8.1 Benchmarks
8.2 Other validations
9 Future work
9.1 Architecture evolution
9.2 New optimisations
9.3 Application of Sista for quick start-ups
9.4 Energy consumption evaluation
10 Conclusion
10.1 Summary
10.2 Contributions
10.3 Impact of the thesis
Bibliography