Withdraw
Loading…
Topics in efficient and privacy-preserving storage system design
Pan, Chao
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/117655
Description
- Title
- Topics in efficient and privacy-preserving storage system design
- Author(s)
- Pan, Chao
- Issue Date
- 2022-11-22
- Director of Research (if dissertation) or Advisor (if thesis)
- Milenkovic, Olgica
- Doctoral Committee Chair(s)
- Milenkovic, Olgica
- Committee Member(s)
- Do, Minh N.
- Zhao, Zhizhen
- Shomorony, Ilan
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- machine learning
- computer vision
- neural networks
- time series
- machine unlearning
- federated learning
- clustering
- graph neural networks
- graph unlearning
- DNA-based data storage
- molecular data storage
- nanopore sequencing
- Abstract
- Data density and data privacy are two main focal points in modern data storage system design. From the data density perspective, DNA-based data storage techniques have become an emerging field in information theory, computer science, and synthetic biology due to their promise of ultrahigh storage density, recording durability, energy efficiency, environment friendliness, and potential capability of integration with in-memory computing platforms. Analogous to traditional digital data storage systems using bits (0 and 1) to record information, DNA-based data storage systems use natural DNA nucleotides (A, T, C, and G) to record information, which can be effectively retrieved via next-generation (e.g., Illumina) or third-generation (e.g., Oxford Nanopores) sequencing technologies. However, all known DNA-based data storage platforms suffer from high costs, high write-read latency, and high error rates, making them hard to deploy in practice at a large scale. Approaches that can effectively overcome these drawbacks of DNA-based data storage systems are still lacking at this moment. To address the issue related to errors arising in DNA write-read procedure, we develop and experimentally test a hybrid DNA-based data storage system termed "2DDNA", which uses machine learning techniques to recover the information without resorting to worst-case error-correction coding redundancy. Also, in contrast to previous methods that only utilize DNA nucleotides to record archival data, where data removing and rewriting is difficult, we show that 2DDNA allows for recording, removing, and rewriting information accurately and permanently by storing it in the sugar-phosphate backbones of DNA. To address the issue of cost and write-read latency, we introduce a prototype system that uses an extended molecular alphabet combining four natural and seven chemically modified nucleotides. The extended molecular alphabet may potentially offer a nearly 2-fold increase in storage density and potentially the same order of reduction in the recording latency, and experimental results show that MspA and Oxford nanopores can discriminate different combinations and ordered sequences of symbols in this extended alphabet with high accuracy. On the other hand, from the data privacy perspective, as the demand for user privacy grows, controlled data removal is becoming a necessary feature for both data storage systems and machine learning models that are trained upon them. Due to recent advances in techniques such as model inversion attacks, which can reconstruct the training samples from model parameters, only permanently removing user data from data sets is insufficient to guarantee the desired level of privacy. One needs to eliminate the influence of data points that requested to be removed on the corresponding machine learning models as well. Nevertheless, at this point, it is still largely unknown how to perform efficient and provable data removal, especially in federated and structured (graph) learning scenarios. To fill in this gap, we propose a series of algorithms and relevant analytical results in this new line of research on machine unlearning. Specifically, we first propose an exact unlearning approach for the federated clustering problem, with theoretical guarantees on the model performance and unlearning complexity. We design a novel secure compressed multiset aggregation scheme as a component of our approach, which is of independent interest for sparse secure model aggregation in the federated learning community. Next, we introduce two approximate unlearning approaches within a unified framework to solve node classification and graph classification tasks in the context of graph learning problems with provable theoretical guarantees. Our extensive simulation results reveal that all three proposed unlearning approaches achieve good trade-offs among privacy, accuracy, and efficiency for data removal tasks.
- Graduation Semester
- 2022-12
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2022 Chao Pan
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…