TY - GEN
T1 - FakeInf
T2 - 18th IEEE/ACM International Conference on Utility and Cloud Computing, UCC 2025
AU - Trihinas, Demetris
AU - Symeonides, Moysis
AU - Cleju, Nicolae
AU - Pallis, George
AU - Dikaiakos, Marios
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s).
PY - 2025/12/31
Y1 - 2025/12/31
N2 - Recent advances in 5G networks and edge computing are enabling low-latency and AI-powered services in close proximity to end users. However, the growing complexity of Deep Learning (DL) models is threatening the vision of EdgeAI, where real-time inference demands substantial computational power, excessive usage of energy, and imposes heavy model-update traffic that overwhelm resource-constrained multi-access edge computing (MECs) nodes. In this paper, we present FakeInf, a framework that supports EdgeAI applications delivering DL-based video stream inference. FakeInf adds a lightweight decision module to DL model-serving pipelines that tracks data volatility and, using probabilistic reasoning, decides for streamed input whether to run the full model or "fake it"by relying on low-cost statistical estimations. This selective execution reduces network traffic, latency, and energy usage while maintaining Quality-of-Service (QoS) within user-defined limits. To demonstrate the efficacy of FakeInf, we integrate it with a real-world smart traffic system hosted on a MEC. FakeInf reduces application latency by 59%, network traffic by 71%, computational overhead by 66%, and energy by 72% while incurring only a modest reduction of 4-6% in the accuracy of the analytic insights emitted. FakeInf also allowed the pipeline to process 2x more video streams compared to the baseline without creating inference bottlenecks.
AB - Recent advances in 5G networks and edge computing are enabling low-latency and AI-powered services in close proximity to end users. However, the growing complexity of Deep Learning (DL) models is threatening the vision of EdgeAI, where real-time inference demands substantial computational power, excessive usage of energy, and imposes heavy model-update traffic that overwhelm resource-constrained multi-access edge computing (MECs) nodes. In this paper, we present FakeInf, a framework that supports EdgeAI applications delivering DL-based video stream inference. FakeInf adds a lightweight decision module to DL model-serving pipelines that tracks data volatility and, using probabilistic reasoning, decides for streamed input whether to run the full model or "fake it"by relying on low-cost statistical estimations. This selective execution reduces network traffic, latency, and energy usage while maintaining Quality-of-Service (QoS) within user-defined limits. To demonstrate the efficacy of FakeInf, we integrate it with a real-world smart traffic system hosted on a MEC. FakeInf reduces application latency by 59%, network traffic by 71%, computational overhead by 66%, and energy by 72% while incurring only a modest reduction of 4-6% in the accuracy of the analytic insights emitted. FakeInf also allowed the pipeline to process 2x more video streams compared to the baseline without creating inference bottlenecks.
KW - B5G networks
KW - Deep Learning
KW - Edge Computing
UR - https://www.scopus.com/pages/publications/105027186459
U2 - 10.1145/3773274.3774270
DO - 10.1145/3773274.3774270
M3 - Conference contribution
AN - SCOPUS:105027186459
T3 - Proceedings of the 18th IEEE/ACM International Conference on Utility and Cloud Computing, UCC 2025
BT - Proceedings of the 18th IEEE/ACM International Conference on Utility and Cloud Computing, UCC 2025
PB - Association for Computing Machinery, Inc
Y2 - 1 December 2025 through 4 December 2025
ER -