PiQi: Partially Quantized DNN Inference on HMPSoCs
| Authors | |
|---|---|
| Publication date | 2024 |
| Book title | Proceedings of the 29th International Symposium on Low Power Electronics and Design |
| Book subtitle | August 5-7, 2024, Newport Beach CA, USA |
| ISBN (electronic) |
|
| Event | 29th International Symposium on Low Power Electronics and Design |
| Number of pages | 6 |
| Publisher | New York, New York: Association for Computing Machinery |
| Organisations |
|
| Abstract |
Deep Neural Network (DNN) inference is now ubiquitous in embedded applications at the edge. State-of-the-art Heterogeneous Multi-Processors System-on-Chip (HMPSoCs) powering these applications come equipped with powerful Neural Processing Units (NPUs) that significantly outperform other inference-capable HMPSoC components - namely, the CPUs and GPUs - in terms of power consumption and performance. However, CPUs and GPUs can perform full precision inference, whereas NPUs can often only perform a quantized inference. Consequently, low-latency, low-power inference by the NPU comes at an accuracy loss due to the quantization. DNNs consist of several heterogeneous layers. Here, we introduce the PiQi framework that allows DNN inference to layer-wise switch between the three inference-capable HMPSoC components, CPU, GPU, and NPU, mid-inference with minimal overhead. Consequently, PiQi employs the novel idea of partially quantized DNN inference on HMPSoCs. However, different DNN layers experience different power-performance gains while projecting different accuracy losses on quantization. Therefore, we provide within PiQi a multi-objective Genetic Algorithm (GA) that provides a power-performance Pareto-front under an accuracy constraint by selective multi-layer quantization during inference. Additionally, PiQi utilizes a neural network to expedite search time by predicting accuracy when assigning DNN layers to the appropriate cores. |
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.1145/3665314.3670841 |
| Downloads |
3665314.3670841
(Final published version)
|
| Permalink to this page | |
