FedMon: A Federated Learning Monitoring Toolkit

Moysis Symeonides, Demetris Trihinas, Fotis Nikolaidis

Research output: Contribution to journalArticlepeer-review

Abstract

Federated learning (FL) is rapidly shaping into a key enabler for large-scale Artificial Intelligence (AI) where models are trained in a distributed fashion by several clients without sharing local and possibly sensitive data. For edge computing, sharing the computational load across multiple clients is ideal, especially when the underlying IoT and edge nodes encompass limited resource capacity. Despite its wide applicability, monitoring FL deployments comes with significant challenges. AI practitioners are required to invest a vast amount of time (and labor) in manually configuring state-of-the-art monitoring tools. This entails addressing the unique characteristics of the FL training process, including the extraction of FL-specific and system-level metrics, aligning metrics to training rounds, pinpointing performance inefficiencies, and comparing current to previous deployments. This work introduces FedMon, a toolkit designed to ease the burden of monitoring FL deployments by seamlessly integrating the probing interface with the FL deployment, automating the metric extraction, providing a rich set of system, dataset, model, and experiment-level metrics, and providing the analytic means to assess trade-offs and compare different model and training configurations.

Original languageEnglish
Pages (from-to)227-249
Number of pages23
JournalInternet of Things
Volume5
Issue number2
DOIs
Publication statusPublished - Jun 2024

Keywords

  • edge computing
  • federated learning
  • internet of things
  • machine learning

Fingerprint

Dive into the research topics of 'FedMon: A Federated Learning Monitoring Toolkit'. Together they form a unique fingerprint.

Cite this