Withdraw
Loading…
Accelerating graph pattern mining algorithms on modern graphics processing units
AlMasri, Mohammad Abed Alnaser
Loading…
Permalink
https://hdl.handle.net/2142/115439
Description
- Title
- Accelerating graph pattern mining algorithms on modern graphics processing units
- Author(s)
- AlMasri, Mohammad Abed Alnaser
- Issue Date
- 2022-04-22
- Director of Research (if dissertation) or Advisor (if thesis)
- Hwu, Wen-mei
- Doctoral Committee Chair(s)
- Hwu, Wen-mei
- Committee Member(s)
- Chen, Deming
- Xiong, Jinjun
- Nagu, Rakesh
- Patel, Sanjay
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- GPU
- CUDA
- Graph
- Graph Pattern Mining
- Graph Pattern Matching
- Clique
- Maximal Clique
- Triangle Counting
- Subgraph Matching
- Graph Analysis
- Abstract
- Graphical processing units (GPUs) have been widely used as a general-purpose processing unit. More recently, there has been fast growing interest in accelerating graph pattern mining tasks on GPUs. At the core of graph pattern mining, graph pattern matching (GPM) is a fundamental operation that aims to find patterns of interest in large data graphs. GPM is typically performed by traversing search trees starting at each vertex in the graph and looking for neighborhood subgraph patterns of interest. A search tree in GPM has a depth equal to the number of vertices in the pattern graph. Each path from the root of a search tree to one of the leaves defines a potential mapping from the pattern graph vertices to a group of neighboring vertices in the data graph. Patterns can be of different sizes and shapes, such as triangles, cliques, wheels, cycles, etc. The larger the pattern size is, the deeper the search tree is. Searching for patterns in graphs is known to be hard due to the exponential growth of intermediate results and the number of instances of patterns. Therefore, execution time and memory requirements grow exponentially with the pattern size. State-of-the-art GPU solutions for GPM traverse the search trees using the Breath-First Search (BFS) approach that exhibits high degrees of parallelism at the cost of high memory requirement, and presents a great challenge for processing large graphs and patterns since the memory capacity of GPUs is significantly smaller than that of CPUs. This work presents novel techniques to implement GPM tree traversal efficiently on GPUs. The tree traversal presented herein uses a hybrid breadth-first and depth-first approach where the top level(s) of the search trees are traversed in a fully parallel, breadth-first manner while each subtree is traversed in a more space-efficient, depth-first manner. The depth-first traversal of subtrees requires less memory but presents more challenges for parallel execution. The techniques presented in this work overcome the less parallel nature of depth-first traversal by exploring vertex-centric and edge-centric parallelization schemes, using multi-level hierarchical organization of threads, improving to the set intersection operation that is fundamental to graph pattern mining algorithms, improving fine-grain parallelism using sub-warp partitioning, and replacing the depth-first recursive traversal with an iterative traversal based on an explicitly-managed shared stack. Moreover, various optimizations are applied to reduce memory consumption and improve data structure reuse. The effectiveness of the proposed techniques is demonstrated using four fundamental graph pattern mining algorithms: (a) subgraph matching for exact GPM for arbitrary patterns, (b) k-clique counting, (c) maximal clique enumeration, and (d) triangle counting. The detailed evaluation results show that the proposed GPU solutions substantially improve performance compared to existing state-of-the-art CPU and GPU solutions. The implementations presented in this thesis can efficiently process the graphs that previously cannot be processed by the state-of-the-art GPU solutions due to their excessive memory requirement. Detailed evaluations of the trade-offs involved in the choice of parallelization scheme, and the incremental speedup of each optimization to provide an in-depth understanding of the optimization space, are provided. The insights from the proposed optimization flow in this work can be useful for optimizing graph mining solutions on GPUs.
- Graduation Semester
- 2022-05
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2022 Mohammad Almasri
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…