TextGuard: Provable defense against backdoor attacks on text classification
Pei, Hengzhi
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/120348
Description
Title
TextGuard: Provable defense against backdoor attacks on text classification
Author(s)
Pei, Hengzhi
Issue Date
2023-04-06
Director of Research (if dissertation) or Advisor (if thesis)
Li, Bo
Department of Study
Computer Science
Discipline
Computer Science
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
Backdoor attack
text classification
provable defense
Abstract
Backdoor attacks against text classification have become a major security threat for deploying machine learning models in security-critical applications. Existing research endeavors have proposed many defenses against these backdoor attacks. Despite demonstrating certain empirical defense efficacy, none of these techniques could provide a formal and provable security guarantee against arbitrary attacks. As a result, they can be easily broken by strong or adaptive attacks, as shown in our evaluation. In this work, we propose TextGuard, the first provable defense against backdoor attacks on text classification. At a high level, TextGuard follows the partition and ensemble mechanism and first divides the (backdoored) training data into disjoint sub-training sets, where most subsets do not contain the backdoor trigger. Then, it trains multiple base classifiers from these subsets and ensemble them as the final classifier. TextGuard could guarantee the cleanliness of the majority base models and thus guarantee its prediction is unaffected by the backdoor trigger in both training and testing inputs. We conduct extensive evaluations for TextGuard on three benchmark datasets, provably and empirically, and demonstrate its superiority against state-of-the-art defenses under multiple backdoor attacks.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.