Zain Taufiquen muotokuva

Väitös (tieto- ja viestintätekniikka): MSc Zain Taufique

Aika

10.6.2026 klo 13.00 – 17.00

MSc Zain Taufique esittää väitöskirjansa ”Efficient Run-time Systems for Edge AI Inference” julkisesti tarkastettavaksi Turun yliopistossa keskiviikkona 10.6.2026 klo 13.00 (Turun yliopisto, päärakennus, Säästöpankki-sali, Turku).

Vastaväittäjänä toimii professori Mario Di Francesco (Aalto-yliopisto) ja kustoksena professori Pasi Liljeberg (Turun yliopisto). Tilaisuus on englanninkielinen. Väitöksen alana on tieto- ja viestintätekniikka.

Tiivistelmä väitöstutkimuksesta:

Artificial intelligence is increasingly deployed on edge devices, such as smartphones, smart glasses, wearables, and autonomous systems, to enable fast on-device responses while preserving user data privacy. These edge platforms integrate heterogeneous compute clusters, including CPUs, GPUs, and AI accelerators, each exhibiting asymmetric energy and performance characteristics. Modern AI applications require inference across diverse machine learning models spanning vision and language modalities, ranging from lightweight neural networks to encoder transformers and large language models, to perform tasks such as image classification, object detection, semantic segmentation, text summarization, and reasoning. However, edge platforms are fundamentally resource-constrained, with limited computational capacity and tight energy budgets. This dissertation addresses the challenge of inferring multiple AI workloads efficiently on resource-constrained, heterogeneous edge platforms while meeting their latency, energy, and accuracy requirements.

A key finding of this dissertation is that existing edge AI frameworks are primarily conservative, failing to leverage heterogeneous resources and adapt to changing system dynamics. Inference requests arrive stochastically and are unknown to the scheduler in advance, leading to variations in the run-time workload. Therefore, this dissertation develops adaptive run-time frameworks that dynamically decide where and how to execute each inference request, jointly orchestrating cluster selection, processor frequency, and, where applicable, model accuracy. The frameworks employ both heuristics and reinforcement learning to explore the complex design space and determine suitable scheduling configurations at run-time. The proposed frameworks were implemented on heterogeneous edge hardware and evaluated using diverse workloads. The results demonstrate consistent reductions in inference latency and energy consumption compared with state-of-the-art techniques, frequently exceeding 50%, with minimal impact on model accuracy.

A further finding is that a single-edge device is often insufficient for compute-intensive workloads. In such cases, the workload can be distributed across nearby trusted devices, forming a collaborative edge cluster. However, these devices have heterogeneous computational capacities, and distributing the workload into equal-sized partitions leads to performance imbalance. Therefore, this dissertation presents methods for partitioning AI workloads in proportion to each device's computational capacity, thereby reducing latency while preserving accuracy.

The broader impact of this research concerns where AI computation is performed. On-device inference, rather than offloading to remote cloud servers, provides faster responses, lower communication overhead, and stronger data privacy, since sensitive information need not leave the user's own device. Overall, this dissertation demonstrates that the advancement of edge AI depends not only on designing larger models and faster accelerators but also on using available resources more effectively through adaptive run-time orchestration, thereby laying the foundation for the next generation of efficient, adaptive, and scalable edge AI systems.