Show simple item record

dc.contributor.advisorGregg, Daviden
dc.contributor.authorGarland, James Philipen
dc.date.accessioned2021-12-02T15:24:29Z
dc.date.available2021-12-02T15:24:29Z
dc.date.issued2021en
dc.date.submitted2021en
dc.identifier.citationGarland, James Philip, Arbitrary Precision and Low Complexity Micro-Architectural Arithmetic Optimisations of Machine Learning Algorithms for Compute Bound and High-Performance Systems, Trinity College Dublin.School of Computer Science & Statistics, 2021en
dc.identifier.otherYen
dc.identifier.urihttp://hdl.handle.net/2262/97651
dc.descriptionAPPROVEDen
dc.description.abstractArtificial intelligence is becoming ubiquitous and pervasive in our daily lives. Machine learning (ML), a subset of Artificial intelligence (AI), supplies more accurate internet searches, voice recognition in home appliances, tagging people in photos, object detection in videos, and driver assistance systems in vehicles. Convolutional neural networks (CNNs), a subset of ML, process these images, videos and sometimes audio data. Captured and preprocessed by embedded internet of things (IoT) devices, CNN data are often processed in internet data centres or on local PCs with high-performance processors and acceleration cards, due to CNNs enormous energy, bandwidth, and processing requirements. There is a need to move more of this CNN processing to IoT edge and embedded devices for low-power and potentially offline, processing. The CNN convolution layer consists of millions of multiply-accumulates (MACs), the arithmetic of which can be in fixed-point, integer or floating-point format. The CNN can operate in training mode or inference mode. During inference, the convolution layer occupies up to 90% of the computation time and energy of the CNN, convolving the input feature map (IFM) with the kernel weight data. The storage, movement of weight data, and acceleration of the convolution computation are often beyond the energy, storage and compute bounds of embedded devices. We investigate opportunities for optimising the hardware energy efficiency, gate-level area, and execution time of the CNN convolution layer s MAC arithmetic, while maintaining inference classification accuracy of the CNN accelerator implementation. Our first contribution investigates reducing energy consumption and application-specific integrated circuit (ASIC) die area while maintaining classification accuracy of CNNs. We also investigate latency and resource efficiency when implemented in field programmable gate array (FPGA). Our second contribution focuses on decreasing software execution time of low-precision floating-point (FP) CNNs by exploiting hardware optimisation of central processing unit (CPU) vector register packing and single instruction multiple data (SIMD) bitwise instructions used in the CNN MAC.en
dc.publisherTrinity College Dublin. School of Computer Science & Statistics. Discipline of Computer Scienceen
dc.rightsYen
dc.subjectCNN, power efficiency, multiply accumulate, arithmetic hardware circuits, ASIC, FPGA, bitslice parallel arithmetic, datapath circuits, hardware accelerators, reduced floating-point precision arithmetic, convolutional neural networks, approximate computingen
dc.titleArbitrary Precision and Low Complexity Micro-Architectural Arithmetic Optimisations of Machine Learning Algorithms for Compute Bound and High-Performance Systemsen
dc.typeThesisen
dc.contributor.sponsorSFI stipenden
dc.type.supercollectionthesis_dissertationsen
dc.type.supercollectionrefereed_publicationsen
dc.type.qualificationlevelDoctoralen
dc.identifier.peoplefinderurlhttps://tcdlocalportal.tcd.ie/pls/EnterApex/f?p=800:71:0::::P71_USERNAME:JGARLANDen
dc.identifier.rssinternalid235249en
dc.rights.ecaccessrightsopenAccess
dc.contributor.sponsorGrantNumber12/IA/1381


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record