Benchmarking Neural Networks on Heterogeneous Hardware

Neural Networks have become one of the most successful machine learning algorithms and are playing a key role in enabling machine vision and speech recognition. Their computational complexity and memory demands are challenging which limits deployment in particular within energy-constrained, embedded environments. To address these challenges, a broad spectrum of customized and heterogeneous hardware architectures have emerged, often accompanied with co-designed algorithms to extract maximum benefit out of the hardware. Furthermore, numerous optimization techniques are being explored to reduce compute and memory requirements while maintaining accuracy. This results in an abundance of algorithmic and architectural choices, some of which fit specific use cases better than others and it is not obvious which approach benefits from which optimization and to what degree. Finally, there is a vast amount of published numbers that were measured under different deployment settings such as power and operating modes, batch sizes, thread counts, and stream sizes, and not always using the same measurement methodologies, which obfuscates this already complex design space even further. For system-level designers and computer architects, there is currently no good way to systematically compare the variety of hardware, algorithm and optimization options. While a number of benchmarking efforts have emerged in this field, they don't address the particular demands of heterogeneous hardware architectures and cover only subsections of the embedded design space. None of the existing benchmarks support essential algorithmic optimizations such as quantization inherently. We propose a novel benchmark suite that addresses this need. QuTiBench is a novel multi-tiered benchmarking methodology, including microbenchmarks and theoretical baselines, that supports algorithmic optimizations and helps system developers understand the benefits and limitations of these novel compute architectures. The theoretical level of the benchmark is unique: It can predict performance and track compute efficiency. Finally, QuTiBench is systematic with a clear measurement methodology. As such we hope it can help form a basis to drive future innovation in this field. We evaluate our benchmarking methodology systematically, initially in the context of inference, with different types of CNN topologies leveraging both pruning and quantization as the most promising optimization techniques. We test across a spectrum of FPGA implementations, GPUs, TPU and VLIW processor, for a selection of systematically pruned and quantized neural networks (including ResNet50, GoogleNetv1, MobileNetv1, a VGG derivative, and a multilayer perceptron). We take the full design space into account including batch sizes, thread counts, stream sizes and operating modes, and considering power, latency, and throughput at a specific accuracy as figures of merit. These results validate our approach. We show that the benchmark adequately represents the potential of this broad spectrum of solutions, and provides sufficient coverage to drive clarity within the complexity of this design space. The theoretical analysis was shown to be highly effective in predicting performance and optimal design solutions. As a result it has the potential save significant experimentation time. The microbenchmarks provided interesting system-level insights although we encountered many practical constraints which limited the amount of experimentation that can be conducted. Additionally, the systematic measurements exposed typical behaviour for the different types of hardware architectures. Finally, we've provided experimental proof that the measurement methodology with the distinction between system and compute level performance illustrates the individual data movement characteristics of the various hardware platforms. There is a critical need for community support as well as truly open data access to generate meaningful research impact. As such we have put significant effort into a web portal which supports third part contributions and offers downloadable and indexed access to all measured and theoretical data points. We expect that through this web portal, our benchmarking efforts can contribute to collective research insights within the community. Alternatively, some of the novel concepts, such as the theoretical baselines as well as the systematic measurement aspects, could be potentially adopted in other large scale benchmarking efforts which already have wider industry support.

Browse

All of TARA

This Collection

Statistics

Benchmarking Neural Networks on Heterogeneous Hardware

File Type:

Item Type:

Date:

Author:

Access:

Citation:

Download Item:

Abstract:

URI:

Author's Homepage:

Description:

Advisor:

Publisher:

Type of material:

URI:

Collections:

Availability:

Keywords: