Privacy implications of information leakage from IP addresses - a web fingerprinting approach
Patil, Simran Pramod
Loading…
Permalink
https://hdl.handle.net/2142/108011
Description
Title
Privacy implications of information leakage from IP addresses - a web fingerprinting approach
Author(s)
Patil, Simran Pramod
Issue Date
2020-05-11
Director of Research (if dissertation) or Advisor (if thesis)
Borisov, Nikita
Department of Study
Electrical & Computer Eng
Discipline
Electrical & Computer Engr
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
Web Fingerprinting, Encrypted DNS, Web Privacy
Abstract
The Internet was not designed with security in mind. A number of recent protocols such as Encrypted DNS, HTTPS, etc. target encrypting critical parts of the web architecture, which were previously sent in the clear. IP addresses still remain visible to on-path observers and can be utilized for censorship, surveillance and sabotaging user’s privacy on the web. We perform a measurement study on datasets representative of the state of the Internet fetched via HTTP Archive or those collected with configurations like Adblock enabled vs. disabled over extended periods of time by crawling Alexa’s top websites to gauge the amount of information leaked by IP addresses. We build a page load fingerprint for each of the websites crawled and filter the websites that have uniquely identifying IP addresses mapped to them. We build a neural network to study how accurately the classifier works in fingerprinting websites based on IP addresses and their respective Autonomous System Numbers (ASNs). Approximately 80% of the IP addresses have an anonymity set comprising of a unique website and can successfully identify it. The classifier performs with an accuracy of about 60% on the remaining data. We observe that the classifier confuses websites belonging to common hosting infrastructures. Manual clustering efforts on the data based on these trends can increase the classification accuracy. We find areas of improvement for the current measurement study and provide suggestions to Content Delivery Networks (CDNs) and other agents fundamental to the Internet infrastructure to increase user privacy.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.