Withdraw
Loading…
Accelerating graph attention network inference on CPUs with layer fusion
Yao, Yao
This item's files can only be accessed by the System Administrators group.
Permalink
https://hdl.handle.net/2142/124697
Description
- Title
- Accelerating graph attention network inference on CPUs with layer fusion
- Author(s)
- Yao, Yao
- Issue Date
- 2024-04-29
- Director of Research (if dissertation) or Advisor (if thesis)
- Torrellas, Josep
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Graph Attention Networks
- CPU
- Layer Fusion
- Abstract
- Graphs are becoming a more and more popular data structure being used in many fields. Recently, Graph Attention Network (GAT), a special type of Graph Neural Network (GNN), has emerged as a powerful tool for processing graph-structured data, offering state-of-the-art performance for graph-related tasks like node classification. Existing works mostly focus on domain specific accelerators to optimize GAT inference. However, we believe that CPU is also an attractive choice for GAT inference because it is widely available and offers large memory capacity. Layer Fusion is a technique introduced in Graphite [35] that combines the memory-intensive aggregation and compute-intensive update phases in a GNN layer to overlap memory accesses with computation, thereby reducing memory stress when executing GNN workloads on CPUs. However, while this technique benefits general GNN models, it does not directly apply to GATs due to their additional attention calculation phase. We posit that this increased complexity in GAT presents a great opportunity for optimization using Layer Fusion techniques. By fusing the attention calculation and aggregation phases, we can overlap memory accesses with computation, thereby reducing DRAM traffic as well as execution time. Hence, this thesis work is motivated to explore how Layer Fusion can optimize GAT inference on CPUs. The thesis begins with an overview of the research context, providing a historical perspective on GAT evolution and the rationale for focusing on accelerating GAT inference on CPUs with Layer Fusion. It then explores the theoretical background of GATs and details their implementation in the DGL framework, serving as the baseline for comparison. Methodologies for incorporating Layer Fusion into GATs are discussed, including three variations that differ in the placement of the attention head iteration. Experimental results comparing the Layer Fusion approach against the DGL baseline show significant improvements in execution times across various datasets, with up to a 2.81x speedup. Sensitivity analyses explore the impact of factors like the number of attention heads and graph characteristics on performance, providing insights into the performance improvement achieved by the Layer Fusion approach.
- Graduation Semester
- 2024-05
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2024 Yao Yao
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…