Analyzing bottlenecks in large recommendation systems
Zhang, Jialiang
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/110285
Description
Title
Analyzing bottlenecks in large recommendation systems
Author(s)
Zhang, Jialiang
Contributor(s)
Hwu, Wen-Mei
Issue Date
2021-05
Keyword(s)
Recommendation Systems
CUDA
GPU
Machine Learning
Embedding
Abstract
Training and inferencing recommendation systems often have a greater need for analysis and
computation over a large number of unstructured user-specific data blobs. One of the state-of-the-art
recommendation models is Deep Learning Recommendation Model (DLRM) by Facebook.
DLRM model consumes a large memory for storing embedding features with terabytes in size
during training and inference. Aside from the memory cost, the long training time of DLRM is
another issue. In this work, we investigated the potential bottlenecks of DLRM and discuss in
detail two recent improvements proposed in the literature: pipeDLRM and TT-Rec. PipeDLRM
proposes pipeline parallelism and split the whole model onto several GPUs to address compute
time without compromising on accuracy while the TT-Rec proposes a new compression method to
save embedding memory consumption at a loss of accuracy to an acceptable range. Our analysis
of these two models shows that irrespective of the method of implementation, they still have certain
issues to improve. For instance, the embedding memory bottleneck still remains in the lookup
operation of the embedding tables in the PipeDLRM model. This is because PipeDLRM’s partition only
sits on one GPU and impedes the further scaling up process. On the other hand, even though TT-Rec
succeeds in reducing the memory complexity of the model, it also requires a significant amount of reuse of the compressed information to retain accuracy. These suggest that there is no right
solution to address the memory capacity problem present in the DLRM.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.