Automatic vectorization through superword level parellelism with associative chain re-ordering and loop shifting
Citation:ROGERS, STEPHEN, Automatic vectorization through superword level parellelism with associative chain re-ordering and loop shifting, Trinity College Dublin.School of Computer Science & Statistics, 2018
Thesis.pdf (Thesis) 2.319Mb
Single instruction, multiple data (SIMD) is a class of parallel computing that involves executing a single operation across multiple pieces of data. A common type of SIMD is vector processing which involves executing a single instruction across 1-dimensional arrays of data called vectors. A category of compiler optimization called automatic vectorization has been developed since the introduction of vector processing to allow 'vectorizing compilers' to target such processor capabilities without direct intervention from application programmers. Convolution is a fundamental concept in image processing. It involves the application of a matrix called a kernel to weight the sum of a pixel and its adjacent pixels, for all pixels in an image. This process is used to perform tasks like image blurring, edge detection and noise reduction. In this thesis, we explore the challenges of automatic vectorization of image convolutions implemented in C and C++. We describe the fundamentals of vectorization and image convolutions and propose an approach for the effective vectorization of these convolutions. Our approach combines vectorization through Superword Level Parallelism with tentative loop unrolling, loop shifting, and the reordering of associative and commutative chains of instructions. Most modern optimizing compilers are capable of vectorizing 3x3 image convolutions, but tend to fail at vectorizing larger sized convolutions, like 5x5. The vectorizer we describe in this thesis, with the aid of its combined optimizations, is designed to vectorize such larger convolutions. Through this combination of optimizations, we have measured performance improvements for 5x5, 7x7, and 9x9 image convolutions. For convolutions operating on integer data types we measured performance improvements between 2.01x and 6.97x, and for floating-point types, between 2.19x and 5.34x.
Author: ROGERS, STEPHEN
Publisher:Trinity College Dublin. School of Computer Science & Statistics. Discipline of Computer Science
Type of material:Thesis
Availability:Full text available