Program generation for Intel AES new instructions
Citation:
Raymond Keith Scott Manley, 'Program generation for Intel AES new instructions', [thesis], Trinity College (Dublin, Ireland). School of Computer Science & Statistics, 2011, pp.196Download Item:
Abstract:
High-performance primitive libraries are used to replace parts of sub-optimal code with optimized implementations. These libraries often come in the form of highly-optimized assembly routines, which raises several issues. Small changes to assembly routines can require significant rewrites. New versions of microarchitectures will often require changes in the assembly to keep code both efficient and functional. Maintaining multiple versions of the same basic piece of assembly code is a costly software engineering problem. One approach to solving this problem is using a program generator. AES-NI is an instruction-set extension on Intel processors that implement a full round of AES encryption in a single instruction. Existing libraries use hand-tuned assembly language to overlap the execution of multiple AES instructions to extract maximum performance. In this dissertation, we argue that using a program generator is suitable substitute for writing highly-optimized assembly routines that use AES-NI. We present a program generation system that seamlessly integrates high-level algorithmic choices with scheduling strategies that exploit instruction-level parallelism. This program generation system returns AES implementations that achieve near optimal performance. We also show that the generator is dynamic enough to take exploratory approaches when optimizing code. As a result, this dissertation also contributes two novel encryption modifications. For CTR mode, we present a “mixed-mode” operation that combines traditional lookup table optimizations with AES-NI instructions. In cyclic modes, such as CBC, we show how manipulating the xor instructions can shorten the chain of dependent operations. These optimized implementations are found using an adapted simulated annealing algorithm. We show these implementations can achieve similar or superior cycle per byte times compared to the high-performance library versions provided by Intel. The end result is a program generation technique that could potentially be adapted to optimize other algorithms that rely on instruction-set extensions.
Sponsor
Grant Number
IRCSET (The Irish Research Council for Science, Engineering and Technology) and Intel Ireland
Author: Manley, Raymond Keith Scott
Advisor:
Gregg, DavidQualification name:
Doctor of Philosophy (Ph.D.)Publisher:
Trinity College (Dublin, Ireland). School of Computer Science & StatisticsNote:
TARA (Trinity’s Access to Research Archive) has a robust takedown policy. Please contact us if you have any concerns: rssadmin@tcd.ieType of material:
thesisAvailability:
Full text availableMetadata
Show full item recordLicences: