Are language models leaking personal information? Memorization vs. Association
Shao, Hanyin
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/120435
Description
Title
Are language models leaking personal information? Memorization vs. Association
Author(s)
Shao, Hanyin
Issue Date
2023-05-02
Director of Research (if dissertation) or Advisor (if thesis)
Chang, Kevin Chen-Chuan
Department of Study
Electrical & Computer Eng
Discipline
Electrical & Computer Engr
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
Natural Language Processing
NLP
Abstract
Large pre-trained language models (PLMs) have transformed the field of natural language processing (NLP) in recent years. PLMs have become the basis for various state-of-the-art NLP systems. Despite the great success of PLMs solving a wide range of NLP tasks, there is rising concern about privacy risks brought with PLMs. For example, recent studies show that PLMs memorize a great portion of training data, including sensitive information, while the information may be leaked unintentionally and utilized by malicious
adversaries.
In this thesis, we evaluate whether PLMs are prone to leaking personal information and discuss possible reasons behind the privacy leakage. Specifically, we attempt to query PLMs for a target email address with contexts of the email address or prompts containing the owner’s name. We find that PLMs do leak personal information mainly due to memorization. However, the risk of specific personal information being extracted by attackers is low because the models are weak at associating personal identifying information with its owner. We also try to quantify PLMs’ capability of association to help validate the safety of PLM in terms of privacy preserving.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.