PiQi: Partially Quantized DNN Inference on HMPSoCs

Open Access
Authors
Publication date 2024
Book title Proceedings of the 29th International Symposium on Low Power Electronics and Design
Book subtitle August 5-7, 2024, Newport Beach CA, USA
ISBN (electronic)
  • 9798400706882
Event 29th International Symposium on Low Power Electronics and Design
Number of pages 6
Publisher New York, New York: Association for Computing Machinery
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Deep Neural Network (DNN) inference is now ubiquitous in embedded applications at the edge. State-of-the-art Heterogeneous Multi-Processors System-on-Chip (HMPSoCs) powering these applications come equipped with powerful Neural Processing Units (NPUs) that significantly outperform other inference-capable HMPSoC components - namely, the CPUs and GPUs - in terms of power consumption and performance. However, CPUs and GPUs can perform full precision inference, whereas NPUs can often only perform a quantized inference. Consequently, low-latency, low-power inference by the NPU comes at an accuracy loss due to the quantization.
DNNs consist of several heterogeneous layers. Here, we introduce the PiQi framework that allows DNN inference to layer-wise switch between the three inference-capable HMPSoC components, CPU, GPU, and NPU, mid-inference with minimal overhead. Consequently, PiQi employs the novel idea of partially quantized DNN inference on HMPSoCs. However, different DNN layers experience different power-performance gains while projecting different accuracy losses on quantization. Therefore, we provide within PiQi a multi-objective Genetic Algorithm (GA) that provides a power-performance Pareto-front under an accuracy constraint by selective multi-layer quantization during inference. Additionally, PiQi utilizes a neural network to expedite search time by predicting accuracy when assigning DNN layers to the appropriate cores.
Document type Conference contribution
Language English
Published at https://doi.org/10.1145/3665314.3670841
Downloads
3665314.3670841 (Final published version)
Permalink to this page
Back