Auto-parallelization of machine-learning dataflow graphs for CPU multicores

Das, Srinjoy

Auto-parallelization of machine-learning dataflow graphs for CPU multicores

Das, Srinjoy

This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.

Permalink

https://hdl.handle.net/2142/121343

Description

Title

Auto-parallelization of machine-learning dataflow graphs for CPU multicores

Author(s)

Das, Srinjoy

Issue Date

2023-07-17

Director of Research (if dissertation) or Advisor (if thesis)

Rauchwerger, Lawrence

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

M.S.

Degree Level

Thesis

Keyword(s)

Parallelization
Clustering
Machine Learning
Graph optimization
Compiler optimization
Dataflow Graph
Inference
Multicores
Pytorch

Abstract

Several methods exist today to accelerate Machine Learning(ML)/Deep-Learning(DL) model performance for training and inference. However, modern techniques that rely on various graph and operator parallelism methodologies rely on search space optimizations which are costly in terms of power and hardware usage. Especially in the case of inference, when the batch size is 1 and execution is on Central Processing Units (CPUs) or at the edge, current techniques can become costly, complicated or inapplicable. To ameliorate this, we present a Critical-Path-based Linear Clustering approach to exploit inherent parallel paths in ML dataflow graphs. We augment this with a new hyperclustering mechanism for small batch sizes > 1 which may be typical in inference scenarios. Our task parallelization approach further optimizes the structure of graphs via cloning and simplifies them via dead-code elimination. Contrary to other work, we generate readable and executable parallel Pytorch+Python code from input ONNX models via a new tool that we have built called Ramiel which allows us to benefit from other downstream acceleration techniques like intra-op parallelism and potentially pipeline parallelism. Our preliminary results on several ML graphs demonstrate up to 1.9× speedup over serial execution and outperform some of the current mechanisms in both compile and runtimes. Lastly, our methods are lightweight and fast enough so that they can be used effectively for Artificial Intelligence (AI) at the edge.

Graduation Semester

2023-08

Type of Resource

Thesis

Handle URL

https://hdl.handle.net/2142/121343

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Auto-parallelization of machine-learning dataflow graphs for CPU multicores

Das, Srinjoy

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In