Withdraw
Loading…
Machine learning for security applications under dynamic and adversarial environments
Yang, Limin
Loading…
Permalink
https://hdl.handle.net/2142/122023
Description
- Title
- Machine learning for security applications under dynamic and adversarial environments
- Author(s)
- Yang, Limin
- Issue Date
- 2023-11-28
- Director of Research (if dissertation) or Advisor (if thesis)
- Wang, Gang
- Doctoral Committee Chair(s)
- Wang, Gang
- Committee Member(s)
- Gunter, Carl A.
- Iyer, Ravishankar K.
- Kalbarczyk, Zbigniew T.
- Cavallaro, Lorenzo
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- machine learning security
- concept drift
- malware detection
- network intrusion detection
- Abstract
- The security community, including both academia and industry, is increasingly adopting machine learning (ML) for its superior generalizability compared to traditional rule-based systems and its ability to automatically learn patterns that are difficult to explicitly describe. However, deploying and maintaining such learning-based models is challenging as they are susceptible to concept drift. This phenomenon arises when the security domain’s environments undergo dynamic changes over time, causing a misalignment between the testing and training data. Consequently, this mismatch often leads to significant failures of the deployed ML models. To combat concept drift, we focus on two popular security domains: malware detection and network intrusion detection systems (IDS). We first introduce CADE, a novel system to detect and explain concept drift in malware classifiers. CADE leverages contrastive learning- based autoencoder for drift detection, and a distance-based explanation for semantically meaningful reasoning. Additionally, concept drift and evolving attacks often require model updates, we leverage a sub-population distribution to develop a new selective backdoor attack (Jigsaw Puzzle) that compromises the model update process by protecting only a subset of malware samples from a specific family. This attack is stealthier than traditional universal backdoors and can evade existing defenses. Furthermore, we investigate the transition from supervised learning to unsupervised learn- ing in network IDS. This transition presents unique challenges due to the sparsity of attack data and the vast diversity of benign data. We analyze incident-level and alert-level dynamics as well as concept drift in a real-world network IDS. We provide insights into current research limitations in network IDS caused by the oversimplification of problem definitions that ignore excessive “attack attempts” and “benign triggers”. Moreover, we measure the concept drift by monitoring the anomaly detection performance in both a real-world dataset and a benchmark synthesized dataset. Our findings reveal that sparse and dispersed real-world attacks may significantly degrade the model performance compared to highly condensed attacks in existing benchmark datasets. In conclusion, this dissertation highlights the importance of understanding data distribution shifts in ML-based security applications. By enhancing detection and mitigation techniques, ML can better serve security tasks.
- Graduation Semester
- 2023-12
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2023 Limin Yang
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…