Withdraw
Loading…
Auto-parallelization of machine-learning dataflow graphs for CPU multicores
Das, Srinjoy
Loading…
Permalink
https://hdl.handle.net/2142/121343
Description
- Title
- Auto-parallelization of machine-learning dataflow graphs for CPU multicores
- Author(s)
- Das, Srinjoy
- Issue Date
- 2023-07-17
- Director of Research (if dissertation) or Advisor (if thesis)
- Rauchwerger, Lawrence
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Parallelization
- Clustering
- Machine Learning
- Graph optimization
- Compiler optimization
- Dataflow Graph
- Inference
- Multicores
- Pytorch
- Abstract
- Several methods exist today to accelerate Machine Learning(ML)/Deep-Learning(DL) model performance for training and inference. However, modern techniques that rely on various graph and operator parallelism methodologies rely on search space optimizations which are costly in terms of power and hardware usage. Especially in the case of inference, when the batch size is 1 and execution is on Central Processing Units (CPUs) or at the edge, current techniques can become costly, complicated or inapplicable. To ameliorate this, we present a Critical-Path-based Linear Clustering approach to exploit inherent parallel paths in ML dataflow graphs. We augment this with a new hyperclustering mechanism for small batch sizes > 1 which may be typical in inference scenarios. Our task parallelization approach further optimizes the structure of graphs via cloning and simplifies them via dead-code elimination. Contrary to other work, we generate readable and executable parallel Pytorch+Python code from input ONNX models via a new tool that we have built called Ramiel which allows us to benefit from other downstream acceleration techniques like intra-op parallelism and potentially pipeline parallelism. Our preliminary results on several ML graphs demonstrate up to 1.9× speedup over serial execution and outperform some of the current mechanisms in both compile and runtimes. Lastly, our methods are lightweight and fast enough so that they can be used effectively for Artificial Intelligence (AI) at the edge.
- Graduation Semester
- 2023-08
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/121343
- Copyright and License Information
- Copyright 2023 Srinjoy Das
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…