EASTER: Learning to Split Transformers at the Edge Robustly

Open Access
Authors
  • T. Stefanov
Publication date 11-2024
Journal IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Volume | Issue number 43 | 11
Pages (from-to) 3626-3637
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Prevalent large transformer models present significant computational challenges for resource-constrained devices at the Edge. While distributing the workload of deep learning models across multiple edge devices has been extensively studied, these works typically overlook the impact of failures of edge devices. Unpredictable failures, due to, e.g., connectivity issues or discharged batteries, can compromise the reliability of inference serving at the Edge. In this article, we introduce a novel methodology, called EASTER, designed to learn robust distribution strategies for transformer models against device failures that consider the tradeoff between robustness (i.e., maintaining model functionality against failures) and resource utilization (considering memory usage and computations). We evaluate EASTER with three representative transformers—ViT, GPT-2, and Vicuna—under device failures. Our results demonstrate EASTER’s efficiency in memory usage, and possible end-to-end latency improvement for inference across multiple edge devices while preserving model accuracy as much as possible under device failures
Document type Article
Language English
Published at https://doi.org/10.1109/TCAD.2024.3438995
Downloads
EASTER (Final published version)
Permalink to this page
Back