Virtual machine showdown: stack versus registers

Virtual machines (VMs) enable the distribution of programs in an architecture-neutral format, which can easily be interpreted or compiled. The most popular VMs, such as the Java virtual machine (JVM), use a virtual stack architecture, rather than the register architecture that are most popular in real processors. A long-running question in the design of VMs is whether a stack architecture or register architecture can be implemented more efficiently with an interpreter. On the one hand, stack architectures allow smaller VM code so less code must be fetched per VM instruction executed. On the other hand, stack machines require more VM instructions for a given computation, each of which requires an expensive (usually unpredictable) indirect branch for VM instruction dispatch. This dissertation extends existing work on comparing virtual stack and virtual register architectures in three ways. Firstly, we generate very high quality register code. The result is that our register code has 46% fewer executed VM instructions compared to optimized JVM stack code, with the bytecode size of the register machine being only 26% larger than that of the corresponding stack code. Secondly we present a fully functional virtual register implementation of the Java virtual machine (JVM), which supports Intel, AMD64, PowerPC and Alpha processors. This register VM supports inline-threaded, direct-threaded, token-threaded, and switch dispatch. Thirdly, we present experimental results on a range of additional optimizations such as register allocation and elimination of redundant heap loads. On the AMD64 architecture the register machine using switch dispatch achieves an average speedup of 1.48 over the corresponding stack machine. Even using the more efficient in line-threaded dispatch, the register VM achieves a speedup of 1.15 over the equivalent stack-based VM. The performance of VM interpreters is much affected by indirect branches and during the course of the work on VM interpreters we identified a strong interaction between the indirect branch predictor and the trace cache. The dissertation investigates the related phenomenon, and shows that the interaction between the two components results in significant improvements in indirect branch prediction. This is particularly true for codes with many indirect branches, such as VM interpreters

Browse

All of TARA

This Collection

Statistics

Virtual machine showdown: stack versus registers

File Type:

Item Type:

Date:

Author:

Access:

Citation:

Download Item:

Abstract:

URI:

Advisor:

Qualification name:

Publisher:

Note:

Type of material:

URI:

Collections:

Availability:

Keywords: