Withdraw
Loading…
Combating abuse on social media platforms using natural language processing
Seyler, Dominic
Loading…
Permalink
https://hdl.handle.net/2142/113025
Description
- Title
- Combating abuse on social media platforms using natural language processing
- Author(s)
- Seyler, Dominic
- Issue Date
- 2021-07-12
- Director of Research (if dissertation) or Advisor (if thesis)
- Zhai, ChengXiang
- Doctoral Committee Chair(s)
- Zhai, ChengXiang
- Committee Member(s)
- Han, Jiawei
- Wang, Gang
- Wang, XiaoFeng
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Natural Language Processing
- Social Media
- Machine Learning
- Cybersecurity
- Abstract
- The World Wide Web has significantly changed the way people interact with one another. With the advent of social media, the web has given humans a way to directly connect to billions of others. As with any new technology, there are great opportunities and benefits, but also dangers and risk. This thesis focuses on the many ways that bad actors can have a negative impact on social media platforms and we explore novel methods that leverage natural language processing (NLP) techniques to combat this abuse First, we analyse and detect suspended accounts on social media platforms. Platform providers suspend accounts that violate a platform's terms for a number of reasons (e.g., spam, offensive and explicit language, etc.). To understand this problem further, we perform a detailed linguistic and statistical analysis into the textual information of suspended accounts and show how insights from our study significantly improve a deep-learning-based detection framework. Since early detection of these high-risk accounts is crucial, we show that this framework can be used to detect suspended accounts earlier than the social media platform. Additionally, we investigate how suspended account detection can be further improved using domain adaptation of word embeddings. In this context we show how cybersecurity related classification can generally be improved by leveraging domain-specific and general unstructured text resources. We then apply this strategy to suspended account detection and further improve the performance of our previous models. Second, we propose novel methods to detect compromised social media accounts, which is a common way for malicious users to spread misinformation and spam. Since the adversary exploits the already established trust of a compromised account, it is crucial to detect these accounts to limit the damage they can cause. We propose a novel method for discovering compromised accounts by semantic analysis of text messages coming out from an account. In our experiments we find that the proposed semantic incoherence features we introduce for this classification task outperform general text representations and can be used for compromised account detection without requiring any manual effort in the data labeling process. Third, we investigate the detection and interpretation of dark jargon: Bad actors on social media often obfuscate their malicious intentions (e.g., selling malware) by using dark jargon, which are benign looking words that have hidden meanings especially among communities in underground forums. For example, when a user posts a thread offering ``rat'', what he/she actually offers is a ``Remote Access Trojan''. As those jargons facilitate an enormous underground economy, identifying the real meaning of dark jargon words is essential for understanding cybercrime activities and is an important step in order to combat social media abuse. In our work we propose novel methods that identify dark jargon words automatically and generate interpretable meaning representations. This is achieved by mapping dark jargon words to clean words based on word context distributions that are estimated on separate corpora. We show that our method is able to outperform a baseline that uses word vectors for context representation. Furthermore, we verify that the results of our method are meaningful and interpretable by performing a manual analysis. Based on this methodology, we further build an online platform that caters to the understanding of online conversation with hidden meaning, which we call DarkJargon.net.
- Graduation Semester
- 2021-08
- Type of Resource
- Thesis
- Permalink
- http://hdl.handle.net/2142/113025
- Copyright and License Information
- Copyright 2021 Dominic Seyler
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…