SwiftSNNI: Optimized Scheduling for Secure Neural Network Inference (SNNI) on Multi-Core Systems

K. Batool; Saleem Anwar; F. Regazzoni; A. Pimentel; Zoltán Ádám Mann

doi:https://doi.org/10.1145/3777884.3797005

SwiftSNNI: Optimized Scheduling for Secure Neural Network Inference (SNNI) on Multi-Core Systems

Authors	K. Batool Saleem Anwar F. Regazzoni A. Pimentel Zoltán Ádám Mann
Publication date	2026
Book title	ICPE '26
Book subtitle	Proceedings of the 17th ACM/SPEC International Conference on Performance Engineering : May 4-8, 2026, Florence, Italy
ISBN (electronic)	9798400723254
Event	17th ACM/SPEC International Conference on Performance Engineering
Pages (from-to)	120-134
Number of pages	15
Publisher	New York, NY: Association for Computing Machinery
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Secure Neural Network Inference (SNNI) enables privacy-preserving inference on encrypted data with strong cryptographic guarantees. However, practical deployments suffer from high preprocessing overhead, significant communication costs, and sequential execution. These limitations lead to low throughput, underutilized system resources, long queueing delays, and poor scalability. This work introduces SwiftSNNI, a unified, resource-aware scheduling framework for SNNI. It implements a hybrid offline–online strategy that orchestrates offline preprocessing (T_pre,i) and online inference (T_on,i) jobs to maximize parallelism. By formulating SNNI scheduling as a constrained optimization problem, SwiftSNNI overlaps T_pre,i phase execution of future requests with active T_on,j, jobs. SwiftSNNI also incorporates optional advance notices to enable proactive T_pre,i, which further reduces average input delay (D). Evaluations using five benchmark neural networks (M1, M2, HiNet, AlexNet, VGG-16) under diverse workloads and stochastic arrival rates confirm substantial performance gains. Compared to a parallelized sequential baseline (MS-SHARK), SwiftSNNI achieves up to 97% lower average input delay (D), a 81% reduction in makespan (≈ 5.4 × speedup), and delivers 5.6 × increase in throughput. Furthermore, SwiftSNNI reduces average waiting time (W) by over 99%, demonstrating robust starvation prevention for high-concurrency workloads. SwiftSNNI supports concurrent execution, scales to larger neural networks, and provides an efficient runtime for SNNI deployments. The SwiftSNNI implementation is available online.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1145/3777884.3797005
Downloads	SwiftSNNI_ACM-vF2 (Accepted author manuscript) SwiftSNNI (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

SwiftSNNI: Optimized Scheduling for Secure Neural Network Inference (SNNI) on Multi-Core Systems