an Application Runtime Virtualization Infrastructure

Get Complete Project Material File(s) Now! »

Concerns of Application Runtime Manipulation

We identify three main concerns when we aim at changing or replacing an application and language runtime. First, how flexible its language runtime is to be changed and adapted without running into inconsistent states. Second, how clear is the separation between the VM and language concerns to extract the language concerns. Finally, what are the abstraction level and the tool support needed to manipulate an application runtime. The VM is often the component in charge of initializing the application run-time and particularly the language runtime. This decision is indeed practical as the VM can safely initialize the language structures, solve the language bootstrapping issues avoiding recursions [KdRB91] (e.g., create the first class or the first string without a class) and ensure the created language runtime complies to its execution model. This coupling is indeed necessary to run a program but does often remain hidden in hardcoded assumptions. A con-sequence of this is that the VM fixes the initial structures of the language runtime. Figure 6.1 illustrates this problem: Ruby’s initial class hierarchy is imprinted inside the VM code, fixing BasicObject as the top superclass and followed by Object, Module and Class respectively1. A second problem arises as the VM includes code to manipulate objects that belong to the application runtime, introducing a duplication between the VM code and the language code. To illustrate this second problem, let us con-sider the code to the left of Figure 2.2 that creates a Dictionary object (a hash map object) in Smalltalk. If our language runtime is defined by a Dictionary object keeping e.g., a map of global objects, we must execute that code to cre-ate the corresponding instance during the language initialization. However, since the language runtime and the VM are in the middle of their initializa-tion, the VM cannot execute this code as it is, and thus it cannot enforce its own invariants. Production VMs will provide an alternative low-level rep-resentation of the same code respecting the same invariants, exemplified at the right of the same figure. This introduces a redundancy: the VM and the language have diﬀerent code to honor the same invariants.

Evaluation Criteria

This section presents the features we consider relevant to include in a runtime manipulation solution. These features serve as criteria to compare the state of the art we selected in the following sections. Manipulate Objects. An ideal application runtime manipulation must pro-vide access to objects. It should support class instantiation, and the ma-nipulation of such objects, including access to their fields. Manipulate Language Elements. An application runtime manipulation so-lution must support the creation and modification of elements that are part the language runtime e.g., create and modify of classes and meth-ods. Manipulate Execution Elements. An application runtime manipulation so-lution must provide support to manipulate elements related to a pro-gram’s execution e.g., the creation, pausing and resumption of threads; the introspection of such threads to understand a program’s execution. Safety. An ideal application runtime manipulation solution must guarantee that both incorrect modifications cannot be applied, and that correct modifications can be safely applied without leaving the system in an inconsistent state. User abstraction level. An ideal application runtime manipulation solution must provide its user with the possibility to express its manipulations in a high-level language. In such a way, users can benefit from the ab-stractions and expressiveness of such a language. API abstraction level. The API of an ideal application runtime manipula-tion solution must provide the abstractions to manipulate the applica-tion runtime in terms of the runtime’s constructions. Separation of concerns. The solution should clearly separate VM and lan-guage concerns to avoid transmitting to the language developer the complexities of VM technology such as the gc or the JIT compiler. In the following sections we explore and compare three diﬀerent cate-gories of solutions that pursue diﬀerent kind of application runtime manip-ulation. This comparison is based on the criteria defined in this section.

Reflection and Metaprogramming Models

Reflection is the capability of a program to reason about and act upon itself [Mae87]. Typically we distinguish two forms of reflective access: structural and behavioral [MJD96]. Structural reflection is concerned with the static structure of a program, while behavioral reflection focuses on the dynamic part of a running program. Orthogonally to the previous categorization we distinguish between introspection and intercession. In-trospection refers to the access to a particular reified representation of a program’s element, whereas for intercession we mean to alter the reified representation. Structural Reflection. Structural reflection refers to the access to the static structure of a program. A typical example3 is to access the class of an object at runtime.
1 ’a string’ class.
An example of structural intercession is to reflectively modify an in-stance variable of an object.
1 aCar instVarNamed: #driver put: Person new. Behavioral Reflection. Behavioral Reflection means to directly interact with the running program. For instance this includes reflectively activating a method. Another more complex example to dynamically switch the execution context and resend the current method with another receiver.
1 thisContext restartWithNewReceiver: Object new Accessing the receiver of the current method through the execution context is an example of behavioral introspection.
1 thisContext receiver. There is not always a clear separation between the two types of reflection. For instance adding new methods requires structural reflection. At the same time, adding a new method alters the future program execution, implying that it is also behavioral reflection. Typically we see that behavioral reflection stops at the granularity of a method. For instance in Pharo by default it is not possible to directly alter the execution on a sub-method level [DDT06]. Reflection provides by default most of the characteristics that we need for runtime manipulation, such as the ability to manipulate the execution of a program or the objects themselves. In the following sections we present sev-eral existing models implementing reflective behavior, and for each of them we put emphasis on their weaknesses and strengths.

Metacircular Runtimes and VMs

The increased complexity of the VMs leads to more novel approaches on how to build VMs and therefore, their runtimes. Metacircular VMs are VMs pro-grammed in the same language they support in the end e.g., a Java VM writ-ten in Java or a Smalltalk VM written in Smalltalk. This approach is based on the principles of high-level low-level programming i.e., expressing low-level concerns using high-level languages [FBC+09]. Metacircular VMs benefits from the abstraction power and tooling of a high-level language to manipu-late their own VMs. This also means that during the build-time of a metacir-cular VM, we can express the manipulations of its application runtime in terms of the high-level language. However, these projects are biased towards VM building techniques and not to the manipulation of the application runtimes that run on top of them. We can see this in the fact that most of the high-level manipulations inside a metacircular VMs do not survive the VM generation i.e., once the VM is built we cannot access its high-level representations any more. Even if we do not focus on the modification of VMs, in this thesis we briefly study metacircular runtimes and VMs with the objective of understanding more concretely the benefits of their high-level low-level approach.
The Squeak VM [IKM+97] is an early open source metacircular VM for the Smalltalk language. Its core building system is still in active use for the Cog VM4 which introduces a JIT compiler. The Cog VM is used as default by the Pharo5 programming language. The Squeak VM is developed using a Smalltalk subset called Slang that is exported to C to be compiled to the final VM binary. Slang is limited to functionalities that can be expressed with standard C code. Slang in this case is mostly a high-level C preprocessor. Even though Slang basically has the same syntax as Smalltalk, it is semantically con-strained to expressions that can be resolved statically at compilation or code generation time and are compatible with C. Hence Slang’s semantics are closer to C’s than to Smalltalk’s. Unlike later metacircular frameworks, the Squeak VM uses little or no compile-time reflection to simplify VM designs. However, class composition helps to structure the source code. Next to the Slang source code which accounts for the biggest part of the interpreter code, some operating system related code and plugins are written in C. To facilitate the interaction with the pure C part, Slang supports inline C expressions and type annotations.
A great achievement of the Squeak VM is a simulator environment that enables programmers to interact dynamically with a simulated version of the running VM. The simulator is capable of running a complete Squeak Smalltalk image including graphical user interface. This means that pro-grammers can change the sources of the running VM and see the immediate eﬀects in the simulator. The VM developer has complete access and control to the VM internals and the application runtime it contains. It can, for exam-ple, change any object and class inside the simulated application runtime. However, to apply such a change it depends on a memory-oriented interface i.e., object modification is achieved by a Smalltalk interface that provides C-like abstractions such as pointer arithmetics. Additionally, once the VM is generated, this low-level interface disappears and it is not accessible for the developer anymore.
Jikes (formerly Jalapeño) is an early metacircular research VM for Java writ-ten in Java [AAB+00]. The Jikes VM features several diﬀerent garbage collec-tors and does not execute bytecodes but directly compiles to native code. With metacircularity in mind Jikes does not resort to a low-level programming lan-guage such as C for these typically low-level VM components. Instead they are written in Java as well using a high-level low-level programming frame-work. The Jikes VM had performance as a major goal, hence direct unob-structed interaction with the low-level world is necessary using a specialized framework. Frampton et al. present a high-level low-level framework packaged as org.vmmagic, which is used as a system interface for Jikes. This framework introduces highly controlled low-level interaction in a statically type con-text. This framework provides a memory-oriented API to manipulate runtime entities at VM generation time, which is used to implement VM con-cerns. Once the VM is compiled to native code, the interface exposed by the org.vmmagic framework is also compiled into native code and not accessi-ble from Java programs executed on the top of the Jikes VM.

Language Virtualization Techniques

The most related family of work in virtualization are approaches like Xen [Chi07]. Xen is a Virtual Machine Monitor (VMM) that allows one to control and manage VMs in a high performance and resource-managed way. This approach targets the virtualization of full and unmodified op-erating systems (OSs), to facilitate their adoption in industrial/productive environments. They rely on support from the hardware platform, and in some cases from the guest OS, concentrating themselves on performance and production features.
Operating System virtualization technology is characterized by the exis-tence of a hypervisor (named after the Operating System supervisor that con-trols the OS processes). The hypervisor is the VM component that allows one to observe or control the internals of one or many VMs. A VM hypervisor gives us, amongst others, the following services: Co-location. Co-location is the ability to have co-existing applications on top of the same virtual machine. Co-located applications can use shared memory to communicate eﬃciently as they reside in the same operat-ing system process.
Resource control. VMs should control how the diﬀerent resources of their applications are used. However, state of the art VMs only control their consumed memory with the usage of a memory manager. They do not perform in general any control in other kind of resources such as CPU or energy consumption. Security. VMs should control how applications access sensitive information such as files and network connections or execute potentially dangerous operations such as system calls.
Application mobility. As applications are portable, they should be easily migrated between diﬀerent VMs also at runtime. Application mobility provides support for resource re-allocation.

Problems of Deployment on Constrained Devices

Deployed object-oriented applications often contain code units (e.g. packages, classes, methods) that the running application never uses. This problem is more evident and harder to control with third party software. Third party libraries and frameworks are designed in a generic fashion that allows mul-tiple usages and functionalities, while applications use only few of them. Ex-amples are logging libraries, web application frameworks or object-relational mappers. Unused deployed code units have an undesired impact when targeting a constrained infrastructure. Some devices may constrain applications due to a restrictive hardware such as low primary or secondary memory [Mar12], or even software impositions such as the Android’s Dalvik VM restriction to deploy only 65536 methods1. Big JavaScript mashup applications have an impact on loading time due to network speed and parsing time on the client. These limitations may forbid the deployment of applications that contain lots of code units, or limit the amount of applications and content a user can have in its device. Existing solutions to this problem eliminate dead code by extracting used code units of an application, and thus reduce application size in secondary memory and primary memory footprint. The majority of the solutions in the field automatically detect and extract used code units, so called tailoring, with static call graph construction as the most dominant technique [GDDC97]. These static approaches present limitations in the presence of dynamic fea-tures such as reflection [LWL05], or in the absence of static type annotations. Additionally, they do not allow the user to customize the process of selection to cover diﬀerent levels of an application’s code i.e., if third-party or base-language libraries are shared amongst several applications, a developer may want to extract only the used application specific code and leave the shared ones untouched; another developer may want to apply the process to the whole application. To clearly show the problem, consider the application using a logging li-brary in Figure 3.1. In this figure, we emphasize in gray the unused code units that can safely be removed. Figure 3.2 shows the code of this application, writ-ten in the Pharo language. This application contains a MainApp class with a start method, which is the entry point of our application. The start method creates an instance of StdoutLogger and logs the application’s start and end. In turn, the StdoutLogger uses the stdout global instance to log in the standard output the current time and the message. To print the time, the StdoutLog-ger makes use of the Time class from the base libraries of the language. Note that for the sake of clarity, we didn’t include in the example all base libraries, though, in modern programming languages they represent a large codebase with several features going from networking to multithreading. For exam-ple, Java 8 SE contains 4240 classes2, and the development edition of Pharo 3.0 [BDN+09] contains 4115 classes and traits.
1. The logger library includes two logging classes (StdoutLogger and Re-moteLogger). Only the StdoutLogger is used and thus, the RemoteLogger class can be discarded.
2. Since the MainApp class does not use the Socket class nor the RemoteL-ogger class (the only user of the Socket class), the Socket class can be discarded.
3. No class in the application makes use of the Date class, and we assume for this example that it is not used in the base-libraries either. Then, this class can be safely removed.
4. The method newLine (lines 7-8 of Figure 3.2) of the StdoutLogger class is not used and can be also removed.

Table of contents :

1 Introduction
1.1 Motivation
1.2 Application Runtimes: Concepts
1.2.1 Language Runtime
1.2.2 Application Specific Runtime
1.3 Problem Statement
1.4 Contributions
1.5 Thesis Outline
1.5.1 Part I: State of the Art
1.5.2 Part II: Espell
1.5.3 Part III: Bootstrapping
1.5.4 Part IV: Tailoring
1.5.5 Part V: Conclusion
I State of the Art
2 Application Runtime Manipulation
2.1 Concerns of Application Runtime Manipulation
2.2 Evaluation Criteria
2.3 Reflection and Metaprogramming Models
2.4 Metacircular Runtimes and VMs
2.5 Language Virtualization Techniques
2.6 Conclusion and Summary
3 Application Runtime Tailoring
3.1 Problems of Deployment on Constrained Devices
3.2 Challenges of Application Tailoring
3.3 Evaluation Criteria
3.4 Dedicated platforms
3.5 Static Analysis-Based Techniques
3.6 Dynamic Analysis-Based Techniques
3.7 Hybrid Analysis-Based Techniques
3.8 Conclusion and Summary
II Espell: an Application Runtime Virtualization Infrastructure
4 Espell: Virtualized Application Runtimes
4.1 Controlling Virtualized Runtimes in Espell
4.2 Object Spaces: First-class Application Runtimes
4.2.1 VM-Setup Interface
4.2.2 Runtime Manipulation Interface
4.3 First-Class Hypervisors
4.4 Cross-Runtime Communication
4.4.1 Process Injection
4.4.2 Virtual Interpretation
4.5 Conclusion and Summary
5 The Espell Prototype
5.1 Pharo Execution Model in a Nutshell
5.2 The Special Objects Array as VM-Setup Interface
5.3 Cycle Execution and Context Switch
5.4 Espell Mirror Implementation
5.5 Espell Virtual Interpreter
5.6 Espell Memory Layout
5.7 Benchmarks
5.7.1 Mirrors Micro-Benchmarks
5.7.2 Process Injection Overhead
5.7.3 Execution Cycle Overhead
5.8 Non Implemented Aspects
5.8.1 JIT Compilation
5.8.2 Plugin and Native Libraries State
5.8.3 Finalization of External Resources
5.9 Conclusion and Summary
III Bootstrapping: Explicit Runtime Generation
6 Bootstrapping Object-Oriented Languages
6.1 Bootstrapping
6.2 Bootstrapping Through an Example
6.3 Bootstrapping with Espell
6.4 The Circular Language Definition
6.4.1 Language base-level entities
6.4.2 Language meta-level entities
6.5 The Bootstrapping Interpreter
6.6 Continuous Bootstrapping
6.7 Conclusion and Summary
7 Bootstrapping Validation
7.1 Languages Used for Experimentation
7.1.1 Language I: Pharo
7.1.2 Languages II and III: Metatalk with and without Mirrors
7.1.3 Language IV: Candle
7.2 Measurements
7.3 Optimizations
7.4 Conclusion and Summary
IV Tailoring: Automatic Extraction of Application Runtimes
8 Run-Fail-Grow: Dynamic Tailoring
8.1 Run Fail Grow: Dynamic dead code elimination
8.2 Run-Fail-Grow through an example
8.3 Detecting Missing Code Units
8.4 Customizing Dead Code Elimination with Seeds
8.5 Tornado: RFG using Espell
8.5.1 Execution Traps with Ghost Proxies
8.5.2 Object Installation and Propagation Rules
8.5.3 Object Identity and Proxies
8.5.4 Implementing Seeds in Tornado
8.5.5 Preparing the Application for Deployment
8.6 Conclusion and Summary
9 Run-Fail-Grow Validation
9.1 Experiments
9.2 Results
9.3 Comparison with a Dedicated Platform
9.4 Evaluation of Tornado
9.5 Discussions on the run-fail-grow approach
9.5.1 Ensuring Completeness
9.5.2 Application Designs that get along with Tornado
9.6 Conclusion and Summary
V Conclusion
10 Conclusion
10.1 Contributions
10.1.1 Espell
10.1.2 Bootstrapping
10.1.3 RFG Tailoring
10.2 FutureWork
10.2.1 Security
10.2.2 Resource Control
10.2.3 Application distribution and migration.
10.2.4 Dynamic Adaptation.
10.2.5 VM-Language Co-Evolution
Bibliography