Automatic program generation for convolutional neural networks on resource constrained devices
Citation:
Keane, Cormac David, Automatic program generation for convolutional neural networks on resource constrained devices, Trinity College Dublin.School of Computer Science & Statistics, 2022Download Item:
genvolution-thesis-2022.pdf (PDF) 6.775Mb
Abstract:
Convolutional Neural Networks (CNNs) are both arithmetically and memory intensive when performing inference. This is a problem when executing CNNs on resource constrained machines, such as small embedded devices. This thesis proposes domain-specific program generators (DSPG), and automatic program optimizers (APO) to improve the resource usage (execution time, memory usage, energy usage) of CNN convolution on ARM devices.
We extend previous work on a DSPG and APO for direct CNN convolution to create Genvolution. Genvolution can automatically generate optimized implementations for CNN convolution on Intel and ARM NEON devices. Genvolution implementations outperforms vendor library im2col implementations for 33% of tested CNN convolutions. Genvolution was also used to investigate the use of Flyte, a reduced precision floating-point storage datatype, on CNN convolution on ARM devices. We demonstrate that generated code using the Flyte datatype improves energy usage while maintaining execution speed for 60% of tested CNN convolutions.
We also propose Winogen, a second DSPG and APO created to produce Winograd CNN convolution implementations. Winogen implementations outperform vendor library Winograd implementations for 90% of tested CNN convolutions. Winogen is also used to investigate a novel Winograd CNN convolution algorithm. Our proposed algorithm reduces the memory overhead of standard Winograd CNN convolution, while still leveraging the problem complexity reduction Winograd convolution allows. We found our new algorithm outperforms standard Winograd convolution for 33% of tested CNN convolutions.
We demonstrate that automatic program generation can be used to improve the resource usage of CNNs on ARM devices. All CNN resource reduction is significant when embedded devices will run the same CNN countless times over their lifespan.
Sponsor
Grant Number
Science Foundation Ireland, project 12/IA/1381
Description:
APPROVED
Author: Keane, Cormac David
Advisor:
Gregg, DavidPublisher:
Trinity College Dublin. School of Computer Science & Statistics. Discipline of Computer ScienceType of material:
ThesisCollections:
Availability:
Full text availableLicences: