Closed-loop network anomaly detection

Zhou, Qinghai

Closed-loop network anomaly detection

Zhou, Qinghai

Permalink

https://hdl.handle.net/2142/121978

Description

Title

Closed-loop network anomaly detection

Author(s)

Zhou, Qinghai

Issue Date

2023-11-14

Director of Research (if dissertation) or Advisor (if thesis)

Tong, Hanghang

Doctoral Committee Chair(s)

Tong, Hanghang

Committee Member(s)

Sun, Jimeng
Zhai, ChengXiang
Chau, Duen Horng

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

data mining
graph mining
anomaly detection
graph neural networks

Abstract

Anomalies are defined as rare observations that significantly deviate from the majority. In recent years, with the networked data becoming ubiquitous, network anomaly detection (NAD), which aims to identify the rare objects in networks, has attracted remarkable attentions in a variety of high-impact applications, ranging from social network analysis (e.g., social spammer detection), online review system (e.g., opinion spam detection) to financial fraud (e.g., credit card fraud detection). Generally speaking, an NAD algorithm is composed of three major components, including (1) networks, (2) supervision, and (3) users. The vast majority of existing NAD techniques have been developed to take networks and supervision as input and deliver the detection results (e.g., a top-k list) to the end user. Despite tremendous advances being achieved, three key challenges remain. First (rich networks), real-world networks are often sourced from multiple instances or dynamically evolving, whereas the majority of existing NAD approaches are designed for single or multiple static aligned network(s). It remains nascent how to detect anomalies in rich (e.g., multiple, dynamic) networks. Second (weak supervision), the existing NAD methods are predominately developed in an unsupervised manner due to lack of supervision. Nevertheless, it has not been well studied on how to leverage low-cost weak supervision (e.g., limited number of labels, labels in coarse granularity) to design supervised algorithms. Third (user interaction), existing methods primarily regard the users as the passive receiving end of an NAD algorithm. It is imperative on how to bring the users into the NAD loop to boost both the interpretability and detection accuracy. The close interactions between the key challenges in NAD naturally necessitate four major tasks, namely predicting, auditing, augmenting and interpreting. First, predicting aims to advance the detection performance in complex networks by mining the crucial knowledge from weak supervised signals. Second, the auditing task studies how user-based anomalous activities and the corresponding alterations on graphs impact the network systems. Third, augmenting correlates users and networks, and explores reinforcing the supervision and network information, to improve NAD algorithms. The goal of interpreting is to help the end users understand the outcome of mining techniques through quantitative uncertainty estimation and intuitive visual explanations. The theme of my Ph.D. research is to collectively address the above key challenges in network anomaly detection through the four major tasks, including predicting, auditing, augmenting and interpreting. Specifically, for predicting, we have developed GDN to learn anomalous patterns from limited labeled anomalies and Meta-GDN which realizes effective meta-knowledge transfer across multiple networks by equipping GDN with a meta learning algorithm. In addition, we design a generic framework, Wedge which is capable of identifying node-level anomalies given coarse-grained subgraph supervision. Second, for auditing, we have designed a family of scalable algorithms, Admiring to analyze the impact of anomalous activities on multi-network systems, to graph learning results. Furthermore, we develop Attent, a generic influence-based query strategy to actively obtain user feedback. Third, for augmenting, we develop G-ADAM, a mixup-based NAD approach that can augment the original limited training data by adaptively interpolating data instances in the embedding space. Moreover, we have studied the problem of dynamically optimizing the user net- work (e.g., teams) with reinforcement learning. For the interpreting task, we have proposed JuryGCN, which is the first frequentist-based approach to quantify node uncertainty of graph convolutional network without model training. JuryGCN has demonstrated superiority in both active learning on node classification and semi-supervised node classification, and achieves the best effectiveness and lowest memory usage than the competitors. We also develop Extra, an interactive visualization tool, to provide intuitive visual explanations for results in the team recommendation scenario.

Graduation Semester

2023-12

Type of Resource

Thesis

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Closed-loop network anomaly detection

Zhou, Qinghai

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In