Analyzing bottlenecks in large recommendation systems

Zhang, Jialiang

Analyzing bottlenecks in large recommendation systems

Zhang, Jialiang

This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.

Permalink

https://hdl.handle.net/2142/110285

Description

Title

Analyzing bottlenecks in large recommendation systems

Author(s)

Zhang, Jialiang

Contributor(s)

Hwu, Wen-Mei

Issue Date

2021-05

Keyword(s)

Recommendation Systems
CUDA
GPU
Machine Learning
Embedding

Date of Ingest

2021-08-11T16:42:10Z

Abstract

Training and inferencing recommendation systems often have a greater need for analysis and computation over a large number of unstructured user-specific data blobs. One of the state-of-the-art recommendation models is Deep Learning Recommendation Model (DLRM) by Facebook. DLRM model consumes a large memory for storing embedding features with terabytes in size during training and inference. Aside from the memory cost, the long training time of DLRM is another issue. In this work, we investigated the potential bottlenecks of DLRM and discuss in detail two recent improvements proposed in the literature: pipeDLRM and TT-Rec. PipeDLRM proposes pipeline parallelism and split the whole model onto several GPUs to address compute time without compromising on accuracy while the TT-Rec proposes a new compression method to save embedding memory consumption at a loss of accuracy to an acceptable range. Our analysis of these two models shows that irrespective of the method of implementation, they still have certain issues to improve. For instance, the embedding memory bottleneck still remains in the lookup operation of the embedding tables in the PipeDLRM model. This is because PipeDLRM’s partition only sits on one GPU and impedes the further scaling up process. On the other hand, even though TT-Rec succeeds in reducing the memory complexity of the model, it also requires a significant amount of reuse of the compressed information to retain accuracy. These suggest that there is no right solution to address the memory capacity problem present in the DLRM.

Type of Resource

text

Genre of Resource

dissertation/thesis

Language

Permalink

http://hdl.handle.net/2142/110285

Owning Collections

Senior Theses - Electrical and Computer Engineering PRIMARY

The best of ECE undergraduate research

Analyzing bottlenecks in large recommendation systems

Zhang, Jialiang

Permalink

Description

Owning Collections

Senior Theses - Electrical and Computer Engineering PRIMARY

Log In