Analysis of Data Streaming Accelerator in Intel Sapphire Rapids Xeon Scalable Processors
Kuper, Reese
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/120387
Description
Title
Analysis of Data Streaming Accelerator in Intel Sapphire Rapids Xeon Scalable Processors
Author(s)
Kuper, Reese
Issue Date
2023-05-03
Director of Research (if dissertation) or Advisor (if thesis)
Kim, Nam Sung
Department of Study
Electrical & Computer Eng
Discipline
Electrical & Computer Engr
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
Intel DSA
On-Chip Accelerators
SoC
DMA
Data Streaming
Hardware Accelerators
Abstract
As the semiconductor power density is no longer constant with the technology process scaling down, modern CPUs are integrating capable data accelerators on chip, aiming to improve performance and efficiency for a wide range of applications and usages. One such accelerator is the Intel Data Streaming Accelerator (Intel DSA) introduced in Intel 4th Generation Xeon Scalable CPUs (Sapphire Rapids). Intel DSA targets data movement operations in memory that are common sources of overhead in datacenter workloads and infrastructure. In addition, it becomes much more versatile by supporting a wider range of operations on streaming data such as CRC32 calculations, delta record creation/merging, and data integrity field (DIF) operations. Several architectural innovations have also been made to facilitate the practical use of Intel DSA, for example, shared virtual memory (SVM) and new x86 instructions for lock-free work descriptor submission and synchronization.
This thesis sets out to introduce the latest features supported by Intel DSA, deep-dive into its versatility, and analyze its throughput and performance benefits through a comprehensive evaluation. Our analysis demonstrates that Intel DSA saves CPU cycles by 37.3\% and 71.3\% when synchronously offloading 1~KB memory copy operations with batch sizes of 1 and 4, respectively, compared to their software counterpart (i.e., memcpy() running on a core). This allows cores to leverage precious cycles for more complex and latency-sensitive tasks rather than for such simple but repetitive operations. By offloading the same operations asynchronously with batch sizes of 1 and 4, Intel DSA provides 2.3x and 6.4x higher throughput than the software counterpart, respectively. In addition to these inherent benefits, we also demonstrate that Intel DSA effectively avoids the pollution of performance-critical resources (i.e. on-chip caches), and thus eliminates performance interference with other co-running memory-intensive/latency-sensitive applications. Along with the analysis of its characteristics, we explore various use cases that can benefit from Intel DSA - DPDK-based VirtIO, SPDK-based NVMe-oF, cloud data caching services, and HPC/ML frameworks - as well as describing other potential use cases. Finally, we provide several guidelines that will help users to effectively use the Intel DSA accelerator device.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.