FakeInf: Selective Deep Neural Network Inference for Latency and Energy-Aware Model Serving Pipelines

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Recent advances in 5G networks and edge computing are enabling low-latency and AI-powered services in close proximity to end users. However, the growing complexity of Deep Learning (DL) models is threatening the vision of EdgeAI, where real-time inference demands substantial computational power, excessive usage of energy, and imposes heavy model-update traffic that overwhelm resource-constrained multi-access edge computing (MECs) nodes. In this paper, we present FakeInf, a framework that supports EdgeAI applications delivering DL-based video stream inference. FakeInf adds a lightweight decision module to DL model-serving pipelines that tracks data volatility and, using probabilistic reasoning, decides for streamed input whether to run the full model or "fake it"by relying on low-cost statistical estimations. This selective execution reduces network traffic, latency, and energy usage while maintaining Quality-of-Service (QoS) within user-defined limits. To demonstrate the efficacy of FakeInf, we integrate it with a real-world smart traffic system hosted on a MEC. FakeInf reduces application latency by 59%, network traffic by 71%, computational overhead by 66%, and energy by 72% while incurring only a modest reduction of 4-6% in the accuracy of the analytic insights emitted. FakeInf also allowed the pipeline to process 2x more video streams compared to the baseline without creating inference bottlenecks.

Original languageEnglish
Title of host publicationProceedings of the 18th IEEE/ACM International Conference on Utility and Cloud Computing, UCC 2025
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9798400722851
DOIs
Publication statusPublished - 31 Dec 2025
Event18th IEEE/ACM International Conference on Utility and Cloud Computing, UCC 2025 - Nantes, France
Duration: 1 Dec 20254 Dec 2025

Publication series

NameProceedings of the 18th IEEE/ACM International Conference on Utility and Cloud Computing, UCC 2025

Conference

Conference18th IEEE/ACM International Conference on Utility and Cloud Computing, UCC 2025
Country/TerritoryFrance
CityNantes
Period1/12/254/12/25

Keywords

  • B5G networks
  • Deep Learning
  • Edge Computing

Fingerprint

Dive into the research topics of 'FakeInf: Selective Deep Neural Network Inference for Latency and Energy-Aware Model Serving Pipelines'. Together they form a unique fingerprint.

Cite this