Withdraw
Loading…
Exploring the trade-offs in web page behavioral abstractions
Murley, Paul
Loading…
Permalink
https://hdl.handle.net/2142/120264
Description
- Title
- Exploring the trade-offs in web page behavioral abstractions
- Author(s)
- Murley, Paul
- Issue Date
- 2023-04-14
- Director of Research (if dissertation) or Advisor (if thesis)
- Bailey, Michael D
- Doctoral Committee Chair(s)
- Bailey, Michael D
- Committee Member(s)
- Gunter, Carl
- Borisov, Nikita
- Wang, Gang
- Mason, Joshua
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Web Crawling
- Behavioral Measurement
- Abstract
- Modern web pages are highly dynamic, often deriving not just their behavior but also their structure from the execution of JavaScript code. Important page functionality commonly continues well past a page load event. As a result, web pages must be treated as applications running continuously in a browser rather than static entities which are simply downloaded and rendered. A failure to adopt this behavioral approach in web measurement risks overlooking important web page characteristics and oversimplifying pages, sites, and the ecosystem as a whole. Accordingly, studies have increasingly leveraged different forms of browser instrumentation which produce distinct abstractions of web page behavior. These abstractions are not interchangeable. The benefits of a particular abstraction in terms of the level of detail and the semantic value of the resultant dataset must be weighed against costs, including the overhead of instrumentation, computing resources, and analysis effort. When researchers select representations of web page behavior that are poorly suited to answer their research questions, they risk gathering inadequate data, overcomplicating their studies, or both. This thesis outlines and explores a framework for reasoning about trade-offs between web page abstractions in empirical studies. Our framework consists of four distinct categories of behavioral page representations: Inputs and Outputs, Feature Usage, Runtime Behavior, and Execution Traces. In the context of this framework, we present a series of applied web measurement studies, which investigate topics including real-time technology adoption, covert in-browser crypto-mining (or “cryptojacking”), browser fingerprinting, JavaScript code obfuscation, and online scams. For each study, we examine the costs and benefits of our chosen abstractions in the context of our framework and consider how different methodologies might alter study results. We generalize our findings, discussing the affordances of each category in our framework and offering insights into the types of research questions each category is best suited to address. We argue that a structured approach to weighing trade-offs between abstractions, such as the one presented here, leads to more efficient and effective studies, and clarifies areas of need for future work in the development of new behavioral web measurement techniques.
- Graduation Semester
- 2023-05
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2023 Paul Murley
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…