Virtual machine showdown: stack versus registers
Citation:
Yunhe Shi, 'Virtual machine showdown: stack versus registers', [thesis], Trinity College (Dublin, Ireland). School of Computer Science & Statistics, 2007, pp 200Download Item:
Shi, Yunhe_TCD-SCSS-PHD-2007-04.pdf (PDF) 1.047Mb
Abstract:
Virtual machines (VMs) enable the distribution of programs in an architecture-neutral format, which can easily be interpreted or compiled. The most popular VMs, such as the Java virtual machine (JVM), use a virtual stack architecture, rather than the register architecture that are most popular in real processors. A long-running question in the design of VMs is whether a stack architecture or register architecture can be implemented more efficiently with an interpreter. On the one hand, stack architectures allow smaller VM code so less code must be fetched per VM instruction executed. On the other hand, stack machines require more VM instructions for a given computation, each of which requires an expensive (usually unpredictable) indirect
branch for VM instruction dispatch. This dissertation extends existing work on comparing virtual stack and virtual register architectures in three ways. Firstly, we generate very high quality register code. The result is that our register code has 46% fewer executed VM
instructions compared to optimized JVM stack code, with the bytecode size of the register machine being
only 26% larger than that of the corresponding stack code. Secondly we present a fully functional virtual register implementation of the Java virtual machine (JVM), which supports Intel, AMD64, PowerPC and Alpha processors. This register VM supports
inline-threaded, direct-threaded, token-threaded, and switch dispatch. Thirdly, we present experimental results on a range of additional optimizations such as register allocation and elimination of redundant heap loads. On the AMD64 architecture the register machine using switch dispatch achieves an average
speedup of 1.48 over the corresponding stack machine. Even using the more efficient in line-threaded dispatch, the register VM achieves a speedup of 1.15 over the equivalent stack-based VM.
The performance of VM interpreters is much affected by indirect branches and during the course of the work on VM interpreters we identified a strong interaction between the indirect branch predictor and the trace cache. The dissertation investigates
the related phenomenon, and shows that the interaction between the two components
results in significant improvements in indirect branch prediction. This is particularly true for codes with many indirect branches, such as VM interpreters
Author: Shi, Yunhe
Advisor:
Gregg, DavidQualification name:
Doctor of Philosophy (Ph.D.)Publisher:
Trinity College (Dublin, Ireland). School of Computer Science & StatisticsNote:
TARA (Trinity's Access to Research Archive) has a robust takedown policy. Please contact us if you have any concerns: rssadmin@tcd.ieType of material:
thesisCollections:
Availability:
Full text availableLicences: