Model and system robustness in distributed CNN inference at the edge

Open Access
Authors
Publication date 01-2025
Journal Integration
Article number 102299
Volume | Issue number 100
Number of pages 9
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Prevalent large CNN models pose a significant challenge in terms of computing resources for resource-constrained devices at the Edge. Distributing the computations and coefficients over multiple edge devices collaboratively has been well studied but these works generally do not consider the presence of device failures (e.g., due to temporary connectivity issues, overload, discharged battery of edge devices). Such unpredictable failures can compromise the reliability of edge devices, inhibiting the proper execution of distributed CNN inference. In this paper, we present a novel partitioning method, called RobustDiCE, for robust distribution and inference of CNN models over multiple edge devices. Our method can tolerate intermittent and permanent device failures in a distributed system at the Edge, offering a tunable trade-off between robustness (i.e., retaining model accuracy after failures) and resource utilization. We verify the system’s robustness by validating the overall end-to-end latency under failures. We evaluate RobustDiCE using the ImageNet-1K dataset on several representative CNN models under various device failure scenarios and compare it with several state-of-the-art partitioning methods as well as an optimal robustness approach (i.e., full neuron replication). In addition, we demonstrate RobustDiCE’s advantages in terms of memory usage and energy consumption per device, and system throughput for various system setups with different device counts.
Document type Article
Language English
Published at https://doi.org/10.1016/j.vlsi.2024.102299
Downloads
Permalink to this page
Back