Multi-objective resource optimization for large scale machine learning systems

Guo, Hongpeng

Multi-objective resource optimization for large scale machine learning systems

Guo, Hongpeng

Permalink

https://hdl.handle.net/2142/124238

Description

Title

Multi-objective resource optimization for large scale machine learning systems

Author(s)

Guo, Hongpeng

Issue Date

2024-04-10

Director of Research (if dissertation) or Advisor (if thesis)

Nahrstedt, Klara

Doctoral Committee Chair(s)

Nahrstedt, Klara

Committee Member(s)

Chen, Deming
Xu, Tianyin
Li, Baochun

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Machine learning
Distributed systems
Resource efficient computing
Performance optimization.

Abstract

The recent advancements in machine learning (ML) have marked significant progress across various fields, delivering high-quality solutions in computer vision, natural language processing, and virtual reality, among others. This leap forward is largely due to the innovations in deep learning and neural networks, which have opened new avenues in data analysis and decision-making processes, profoundly affecting people's lives and how society functions. However, ML systems encounter considerable challenges in terms of resource efficiency, grappling with complex issues such as network bandwidth exhaustion, computational intensity, energy consumption, and hardware diversity. These challenges are interconnected, making the task of managing resources efficiently even more daunting. To enhance the effectiveness of machine learning models, it's crucial that these systems are optimized across multiple dimensions to strike a balance between performance, efficiency, and scalability, thereby ensuring sustainable operation at a larger scale. In this thesis, we introduce a multi-objective resource optimization framework aimed at addressing the overarching resource challenges in large-scale machine learning systems. Leveraging the optimization opportunities presented by data redundancy and hardware configurability, we detail three initiatives that demonstrate optimizations for resource constraints within large-scale ML systems. Specifically, CrossRoI tackles network bandwidth and computational intensity by leveraging data redundancy in video streams, significantly reducing the amount of data required for processing and transmission. BoFL targets energy consumption and the timeliness of learning tasks, employing dynamic hardware configuration to enhance the power efficiency of devices involved in time-sensitive federated learning, which in turn prolongs battery life and lowers operational expenses. FedCore addresses the straggler effect in federated learning through the implementation of distributed coresets, minimizing the data processed by slower devices and thus boosting the overall efficiency of the system without sacrificing accuracy. Collectively, these frameworks embody a comprehensive approach to multi-objective resource optimization, illustrating their effectiveness through significant enhancements across various resource dimensions. Moreover, our experiments confirm that adopting a holistic design that leverages both data and hardware opportunities can substantially elevate the efficiency of resource usage in machine learning systems.

Graduation Semester

2024-05

Type of Resource

Thesis

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Multi-objective resource optimization for large scale machine learning systems

Guo, Hongpeng

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In