Arbitrary Precision and Low Complexity Micro-Architectural Arithmetic Optimisations of Machine Learning Algorithms for Compute Bound and High-Performance Systems

dc.contributor.advisor	Gregg, David	en
dc.contributor.author	Garland, James Philip	en
dc.date.accessioned	2021-12-02T15:24:29Z
dc.date.available	2021-12-02T15:24:29Z
dc.date.issued	2021	en
dc.date.submitted	2021	en
dc.identifier.citation	Garland, James Philip, Arbitrary Precision and Low Complexity Micro-Architectural Arithmetic Optimisations of Machine Learning Algorithms for Compute Bound and High-Performance Systems, Trinity College Dublin.School of Computer Science & Statistics, 2021	en
dc.identifier.other	Y	en
dc.identifier.uri	http://hdl.handle.net/2262/97651
dc.description	APPROVED	en
dc.description.abstract	Artificial intelligence is becoming ubiquitous and pervasive in our daily lives. Machine learning (ML), a subset of Artificial intelligence (AI), supplies more accurate internet searches, voice recognition in home appliances, tagging people in photos, object detection in videos, and driver assistance systems in vehicles. Convolutional neural networks (CNNs), a subset of ML, process these images, videos and sometimes audio data. Captured and preprocessed by embedded internet of things (IoT) devices, CNN data are often processed in internet data centres or on local PCs with high-performance processors and acceleration cards, due to CNNs enormous energy, bandwidth, and processing requirements. There is a need to move more of this CNN processing to IoT edge and embedded devices for low-power and potentially offline, processing. The CNN convolution layer consists of millions of multiply-accumulates (MACs), the arithmetic of which can be in fixed-point, integer or floating-point format. The CNN can operate in training mode or inference mode. During inference, the convolution layer occupies up to 90% of the computation time and energy of the CNN, convolving the input feature map (IFM) with the kernel weight data. The storage, movement of weight data, and acceleration of the convolution computation are often beyond the energy, storage and compute bounds of embedded devices. We investigate opportunities for optimising the hardware energy efficiency, gate-level area, and execution time of the CNN convolution layer s MAC arithmetic, while maintaining inference classification accuracy of the CNN accelerator implementation. Our first contribution investigates reducing energy consumption and application-specific integrated circuit (ASIC) die area while maintaining classification accuracy of CNNs. We also investigate latency and resource efficiency when implemented in field programmable gate array (FPGA). Our second contribution focuses on decreasing software execution time of low-precision floating-point (FP) CNNs by exploiting hardware optimisation of central processing unit (CPU) vector register packing and single instruction multiple data (SIMD) bitwise instructions used in the CNN MAC.	en
dc.publisher	Trinity College Dublin. School of Computer Science & Statistics. Discipline of Computer Science	en
dc.rights	Y	en
dc.subject	CNN, power efficiency, multiply accumulate, arithmetic hardware circuits, ASIC, FPGA, bitslice parallel arithmetic, datapath circuits, hardware accelerators, reduced floating-point precision arithmetic, convolutional neural networks, approximate computing	en
dc.title	Arbitrary Precision and Low Complexity Micro-Architectural Arithmetic Optimisations of Machine Learning Algorithms for Compute Bound and High-Performance Systems	en
dc.type	Thesis	en
dc.contributor.sponsor	SFI stipend	en
dc.type.supercollection	thesis_dissertations	en
dc.type.supercollection	refereed_publications	en
dc.type.qualificationlevel	Doctoral	en
dc.identifier.peoplefinderurl	https://tcdlocalportal.tcd.ie/pls/EnterApex/f?p=800:71:0::::P71_USERNAME:JGARLAND	en
dc.identifier.rssinternalid	235249	en
dc.rights.ecaccessrights	openAccess
dc.contributor.sponsorGrantNumber	12/IA/1381

Files in this item

Name:: thesisPrintedJamesGarland.pdf
Size:: 11.43Mb
Format:: PDF
Description:: Accepted for publication (author's ...

View/Open

This item appears in the following Collection(s)

Trinity College Dublin Theses & Dissertations
Computer Science (Theses and Dissertations)
Computer Science (Theses and Dissertations)

Show simple item record

Browse

All of TARA

This Collection

Statistics

Arbitrary Precision and Low Complexity Micro-Architectural Arithmetic Optimisations of Machine Learning Algorithms for Compute Bound and High-Performance Systems

Files in this item

This item appears in the following Collection(s)