TY - GEN
T1 - Towards Low-Cost and Energy-Aware Inference for EdgeAI Services via Model Swapping
AU - Trihinas, Demetris
AU - Michael, Panagiotis
AU - Symeonides, Moysis
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Over the past decade, key advancements in Artificial Intelligence (AI) and Edge Computing (EC) have led to the development of EdgeAI services to provide intelligent and low latency responses essential for mission-critical applications. However, the expansion of EdgeAI services to the network extremes can face challenges such as load fluctuations causing delays in AI inference and concerns over energy efficiency. This paper proposes 'model swapping' where the model employed by the EdgeAI service is swapped on-the-fly with another readily available model so that cost and energy savings are achieved during runtime inference tasks. The ModelSwapper can achieve this by employing a lowcost algorithmic technique that explores meaningful trade-offs between the computational overhead and the model accuracy. By doing so, edge nodes adapt to load fluctuations by substituting complex models with simpler ones, thus meeting desired latency requirements, albeit with potentially higher uncertainty. Our evaluation with two EdgeAI services (object detection, NLU) demonstrates that ModelSwapper can significantly reduce energy usage and inference delays by at least 27% and 68\% respectively, with only a 1\% reduction in accuracy.
AB - Over the past decade, key advancements in Artificial Intelligence (AI) and Edge Computing (EC) have led to the development of EdgeAI services to provide intelligent and low latency responses essential for mission-critical applications. However, the expansion of EdgeAI services to the network extremes can face challenges such as load fluctuations causing delays in AI inference and concerns over energy efficiency. This paper proposes 'model swapping' where the model employed by the EdgeAI service is swapped on-the-fly with another readily available model so that cost and energy savings are achieved during runtime inference tasks. The ModelSwapper can achieve this by employing a lowcost algorithmic technique that explores meaningful trade-offs between the computational overhead and the model accuracy. By doing so, edge nodes adapt to load fluctuations by substituting complex models with simpler ones, thus meeting desired latency requirements, albeit with potentially higher uncertainty. Our evaluation with two EdgeAI services (object detection, NLU) demonstrates that ModelSwapper can significantly reduce energy usage and inference delays by at least 27% and 68\% respectively, with only a 1\% reduction in accuracy.
KW - Edge Computing
KW - Machine Learning
UR - https://www.scopus.com/pages/publications/85212222768
U2 - 10.1109/IC2E61754.2024.00026
DO - 10.1109/IC2E61754.2024.00026
M3 - Conference contribution
AN - SCOPUS:85212222768
T3 - Proceedings - 2024 IEEE International Conference on Cloud Engineering, IC2E 2024
SP - 168
EP - 177
BT - Proceedings - 2024 IEEE International Conference on Cloud Engineering, IC2E 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 12th IEEE International Conference on Cloud Engineering, IC2E 2024
Y2 - 24 September 2024 through 27 September 2024
ER -