Register Allocation using Graph Coloring 

Get Complete Project Material File(s) Now! »

Java Bytecode

One of the major design goals of Java is that programs written in Java should be cross-platform. This means that one should be able to write a Java pro-gram on one platform and execute the program on another platform, without any modifications to the code and without having to recompile the program. In practice this means that when a Java program is compiled into class files, the class files will contain a platform-independent instruction set called Java bytecode. In order to run a Java program on a specific platform, a compliant JVM must be developed for that platform. A JVM is a program that is able to execute programs that are compiled to the Java bytecode instruction set by translating the Java bytecode into native machine instructions of the platform at hand.
The Java bytecode is a more compact representation of Java code in the form of instructions that the JVM executes. An instruction can be as short as one byte, although many Java bytecode instructions have additional arguments, adding a few bytes to the instructions. The Java bytecode instruction set is stack-oriented, which means that it does not use any registers. Instead everything must be pushed to a stack before it can be used in a calculation. In order to demonstrate how bytecode is structured, consider a regular Java method shown in Figure 2.1.
static int add( int a , int b ) {
return a + b ;
}
Figure 2.1: A simple add method in Java.
Figure 2.2 shows the same method compiled into Java bytecode. The method consists of four Java bytecode instructions, each instruction is one byte in size. Two values are pushed onto the stack, load_0 and iload_1. The add instruction pops the two operands and pushes the result from the addition onto the stack which is then popped by the return instruction.
0: iload_1
1: iload_2
2: iadd
3: ireturn
Figure 2.2: The compilation of the method in Figure 2.1.
In the Java HotSpot VM the Java bytecode instructions are processed one by one, by implementing a stack based machine. Each time a method is invoked, the Java bytecode instructions for the method are interpreted and the stack machine is used to store results and variables for future use. If the same method is called again, it has to be interpreted once again. The interpretation acts as an layer on top of the architecture. It does not have to know anything about the hardware it is running on. And even though this is a key component of Java’s platform independence, interpretation makes execution slow.

JIT Compilation

In order to improve the execution time the Java bytecode is compiled to native machine code to avoid the performance overhead of the interpretation. Instead of interpreting the Java bytecode, it is compiled into instructions that the hard-ware is familiar with. In order to preserve the cross-platform requirement, this is not done prior to execution, since that would make the code platform -dependent. Instead JIT compilation is used, meaning that the Java bytecode is compiled to native code at runtime, while the Java HotSpot VM is running. The downside to dynamic compilation is that the time it takes to compile a method could be spent on interpreting code instead. JIT compilation is a time consuming task, so in order for JIT compilation to be benefitial, the code that is compiled has to be executed frequently.
In other words, the cost C of compiling a method M to machine code can be amortized over all invocations N of the method, and if the time it takes to interpret M is IM and the time it takes to execute the compiled version of M is CM, then the goal is to satisfy C + N * CM <= N * IM. Otherwise, the system has been made slower due to the JIT compilation.
As mentioned earlier there are two JIT compilers for the Java HotSpot VM, the Java HotSpot VM client compiler and the Java HotSpot server compiler. While the client compiler is faster than the server compiler, the client compiler does not produce code of the same quality as the server compiler. The client com-piler is suitable when low startup times and responsiveness are important. The server compiler on the other hand is suitable for long-running high performance applications, such as typical server applications.

Intermediate Representations

When a compiler translates source code written in one computer language into another one, in this case from Java bytecode to machine code, the compiler uses one or more intermediate representations of the source code during the compilation process.
There are several reasons why to use intermediate representations. One is that a compiler can compile programs written in different languages, as long as the pro-grams have been converted to an intermediate representation supported by the compiler. An example of this is the LLVM infrastructure with its intermediate representation named LLVM IR [1].
The intermediate representations can be divided into different levels of abstrac-tion. Low-level intermediate representations are closer to machine code. This facilitates platform-specific optimizations. High-level intermediate representa-tions are usually platform independent and give a better view of the control flow of methods since the platform dependent details are omitted. High-level intermediate representations also enable more general optimizations such as Constant folding, Common subexpression elimination and loop optimizations [2].

READ  Hierarchical speedup techniques for static road networks . 

Control Flow Graph

A control flow states the order in which statements and instructions are executed or evaluated.
A very common intermediate representation is a Control Flow Graph (CFG). A CFG consists of basic blocks. A basic block is a continuous sequence of instructions without jumps or jump-targets in the between. This means that only the last instruction in a block can be a jump to one or more basic blocks.
In Figure 2.3 a Java method is shown and illustrated using a CFG.
The CFG is an important representation that is used throughout the compilation process, in particular in the liveness analysis (which will be explained later in this chapter) used for register allocation.

Static Single Assignment Form

A property used often in intermediate representations is the Static Single As-signment (SSA) form. This means that each variable is only defined once and it simplifies a lot of optimizations that the compiler is doing, including liveness analysis. Figure 2.4 illustrates how a simple Java code snippet is converted into SSA form. When a variable is assigned more than once, a new variable is created replacing all later uses of the original variable.
However, there are situations where this notation is not sufficient. Consider another example in Figure 2.5. Both assignments of y have already been replaced with unique variables y1 and y2. But which y will be used in the assignment of g in g = f + y?
The problem arises because of a merged control flow with assignments to the same variable in multiple branches. This is solved by adding a special statement, a φ(Phi)-function. The φ- function takes all the different definitions of a variable as input (in this case y). As shown in Figure 2.6 the φ- function will be added in the beginning of the last block. It will generate a new variable y3, that represents either y1 or y2. This way, the intermediate representation is still on SSA form.

Liveness Analysis and Live Ranges

Liveness analysis determines which variables are live at a certain program point in the CFG. If a variable V at a certain program point P is used later in the code, the variable V is live at P, otherwise the variable V is dead at P. For each variable, a live range is calculated. A live range represents at which program points in the code that a variable is live. In its simplest representation, a live range starts at a variables first definition and ends at its last use.
In order to do certain optimizations such as dead code elimination and more importantly, in the context of this thesis, to do register allocation, liveness analysis is needed. This will be obvious in later chapters.
Liveness can be calculated with different quality. A less precise liveness is faster to compute, but may result in poorer quality of the produced code. This means that some variables that might not be live, are still displayed as being live, because the liveness analysis is less precise. The downside to this is that more variables maybe be considered live at the same time, which may as a consequence result in reduced code quality of the produced code.

Table of contents :

1 Introduction 
1.1 The Java Execution and Runtime
1.2 Two JIT Compilers in the HotSpot VM
1.3 The Goal of This Thesis
1.4 Outline
2 Background 
2.1 Java Bytecode
2.2 JIT Compilation
2.3 Intermediate Representations
2.4 Control Flow Graph
2.5 Static Single Assignment Form
2.6 Liveness Analysis and Live Ranges
3 Register Allocation 
3.1 Local Register Allocation
3.2 Global Register Allocation
4 Register Allocation using Graph Coloring
4.1 Pruning the Interference Graph
4.2 Reconstructing the Interference Graph
5 The Linear Scan Algorithm
5.1 Block Order, Row Numbering and Intervals
5.2 Assigning Physical Registers to Virtual Registers
5.3 Resolve Stack Intervals
5.4 Lifetime Holes
5.5 Interval Splitting
6 Implementing the Linear Scan Algorithm in the Server Compiler
6.1 The Intermediate Representation in the Server Compiler
6.2 Mapping Nodes to Machine Instructions
6.2.1 Fixed Registers
6.3 Building Intervals
6.3.1 The Interval Class
6.3.2 Create Live Ranges
6.4 Assigning Registers and Memory Positions
6.5 Resolving the Intervals that are Assigned Memory Positions
6.6 Resolving Registers at Calls
6.6.1 Removing the SSA Form
7 Performance Testing and Results 
7.1 Performance
7.2 Compilation Time
7.2.1 Register Allocation Time
7.2.2 Time Distribution inside the Linear Scan Register Allocator in the Server Compiler
7.2.3 Total Compilation Time
8 Future work 
8.1 Improving the Runtime Performance
8.1.1 Interval Splitting
8.1.2 Use Profile Data for Spilling Heuristics
8.1.3 Improving the Phi Resolver
8.2 Reducing the Compilation Time
8.2.1 Reducing the Total Time being Spent in the Server Compiler
8.2.2 Replacing the Liveness Analysis
9 Conclusions 

GET THE COMPLETE PROJECT

Related Posts