Withdraw
Loading…
Fine-grained memory access over I/O interconnect for efficient remote sparse data access
Min, Seung Won
Loading…
Permalink
https://hdl.handle.net/2142/115489
Description
- Title
- Fine-grained memory access over I/O interconnect for efficient remote sparse data access
- Author(s)
- Min, Seung Won
- Issue Date
- 2022-04-19
- Director of Research (if dissertation) or Advisor (if thesis)
- Hwu, Wen-mei
- Doctoral Committee Chair(s)
- Hwu, Wen-mei
- Committee Member(s)
- Chen, Deming
- Chung, I-hsin
- Huang, Jian
- Patel, Sanjay
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- data access efficiency
- large-scale dataset
- sparse memory access
- Abstract
- The combination of the growing size and complexity of application datasets is introducing a new challenge to accelerators. The growing size of datasets forces us to place them in a larger CPU memory, and the growing complexity of datasets introduces more irregularity in data access patterns. However, the existing data transfer mechanisms are optimized toward transferring regular and densely accessed datasets, and not for the complex and sparse datasets. With the existing methods, now the accelerators often spend more time accessing complex datasets stored in CPU memory rather than actually doing the computation. To overcome the limitations of the existing data transfer mechanisms, this dissertation proposes to utilize many fine-grained memory accesses over I/O interconnect, such as the industry standard PCIe, instead of the traditional coarse-grained block data transfer method. While the fine-grained memory access over I/O interconnect poses a danger of introducing high per-packet overhead, the benefit from its flexibility in accessing complex data structures can outweigh the overhead. To accurately evaluate the overhead and benefit of the fine-grained memory access over I/O interconnect, we begin by developing a methodology to directly analyze I/O traffic. While the direct I/O level analysis is not a prevalent approach in the current academic research, the existing indirect application-level analysis approach is insufficient to fully capture the intricacy of I/O behaviors. We fill this gap by designing our own custom I/O analyzer using a field-programmable gate array (FPGA) and demonstrate how the potential overhead of the fine-grained access over I/O interconnect can be identified and avoided. Based on the insights we gained from the analysis, we redesign and optimize several real-world applications using the fine-grained memory access over I/O interconnect and show that we can speed up the applications several times over the existing methods. Next, this dissertation addresses the question of integrating the fine-grained memory access over I/O interconnect into the existing software development environment. Since most programmers may not have a deep hardware-level understanding of the fine-grained memory access over I/O interconnect, it is necessary to abstract away any optimizations that require deep understanding of the hardware and provide these optimizations in libraries and frameworks. To achieve this goal, in our work, we abstract away the hardware-level optimizations behind our custom array class called UnifiedTensor, and transparently apply the optimizations whenever remote memory accesses are done through this class. With the help of the abstraction, only about 2-3 lines of code modifications are sufficient to fully utilize our method for most of the existing programs. At the same time, we also provide a flexible development environment to enable quick deployment of new hardware optimizations for the framework developers. Finally, we conclude this dissertation by proposing a flexible data tiering strategy in modern systems with the fine-grained memory access over I/O interconnect. While there are multiple tiers of memory in modern computer systems, currently partitioning sparse datasets over multiple tiers of memory and seamlessly accessing them in applications requires a lot of programming effort. To overcome the programming difficulty, we unify all data access methods to different tiers of memory with the fine-grained memory access over I/O interconnect. This not only keeps the overall application structure concise but also allows the programmers to quickly try out different data partitioning strategies in favor of achieving better data locality.
- Graduation Semester
- 2022-05
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2022 Seung Won Min
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…