TY - JOUR
T1 - EQUALITY
T2 - Quality-aware intensive analytics on the edge
AU - Michailidou, Anna Valentini
AU - Gounaris, Anastasios
AU - Symeonides, Moysis
AU - Trihinas, Demetris
N1 - Publisher Copyright:
© 2021
PY - 2022/3
Y1 - 2022/3
N2 - Our work is motivated by the fact that there is an increasing need to perform complex analytics jobs over streaming data as close to the edge devices as possible and, in parallel, it is important that data quality is considered as an optimization objective along with performance metrics. In this work, we develop a solution that trades latency for an increased fraction of incoming data, for which data quality-related measurements and operations are performed, in jobs running over geo-distributed heterogeneous and constrained resources. Our solution is hybrid: on the one hand, we perform search heuristics over locally optimal partial solutions to yield an enhanced global solution regarding task allocations; on the other hand, we employ a spring relaxation algorithm to avoid unnecessarily increased degree of partitioned parallelism. Through thorough experiments, we show that we can improve upon state-of-the-art solutions in terms of our objective function that combines latency and extent of quality checks by up to 2.56X. Moreover, we implement our solution within Apache Storm, and we perform experiments in an emulated setting. The results show that we can reduce the latency in 86.9% of the cases examined, while latency is up to 8 times lower compared to the built-in Storm scheduler, with the average latency reduction being 52.5%.
AB - Our work is motivated by the fact that there is an increasing need to perform complex analytics jobs over streaming data as close to the edge devices as possible and, in parallel, it is important that data quality is considered as an optimization objective along with performance metrics. In this work, we develop a solution that trades latency for an increased fraction of incoming data, for which data quality-related measurements and operations are performed, in jobs running over geo-distributed heterogeneous and constrained resources. Our solution is hybrid: on the one hand, we perform search heuristics over locally optimal partial solutions to yield an enhanced global solution regarding task allocations; on the other hand, we employ a spring relaxation algorithm to avoid unnecessarily increased degree of partitioned parallelism. Through thorough experiments, we show that we can improve upon state-of-the-art solutions in terms of our objective function that combines latency and extent of quality checks by up to 2.56X. Moreover, we implement our solution within Apache Storm, and we perform experiments in an emulated setting. The results show that we can reduce the latency in 86.9% of the cases examined, while latency is up to 8 times lower compared to the built-in Storm scheduler, with the average latency reduction being 52.5%.
KW - Data quality
KW - Fog computing
KW - Optimization
KW - Sensors
UR - http://www.scopus.com/inward/record.url?scp=85120944813&partnerID=8YFLogxK
U2 - 10.1016/j.is.2021.101953
DO - 10.1016/j.is.2021.101953
M3 - Article
AN - SCOPUS:85120944813
SN - 0306-4379
VL - 105
JO - Information Systems
JF - Information Systems
M1 - 101953
ER -