Withdraw
Loading…
Big data storage workload characterization, modeling and synthetic generation
Abad, Cristina L.
Loading…
Permalink
https://hdl.handle.net/2142/49497
Description
- Title
- Big data storage workload characterization, modeling and synthetic generation
- Author(s)
- Abad, Cristina L.
- Issue Date
- 2014-05-30T16:47:05Z
- Director of Research (if dissertation) or Advisor (if thesis)
- Campbell, Roy H.
- Doctoral Committee Chair(s)
- Campbell, Roy H.
- Committee Member(s)
- Nahrstedt, Klara
- Gupta, Indranil
- Lu, Yi
- Cherkasova, Ludmila
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Big Data
- Hadoop
- MapReduce
- workload
- Mimesis
- MimesisBench
- Hadoop Distributed File System (HDFS)
- storage
- locality
- popularity
- Abstract
- A huge increase in data storage and processing requirements has lead to Big Data, for which next generation storage systems are being designed and implemented. As Big Data stresses the storage layer in new ways, a better understanding of these workloads and the availability of flexible workload generators are increasingly important to facilitate the proper design and performance tuning of storage subsystems like data replication, metadata management, and caching. Our hypothesis is that the autonomic modeling of Big Data storage system workloads through a combination of measurement, and statistical and machine learning techniques is feasible, novel, and useful. We consider the case of one common type of Big Data storage cluster: A cluster dedicated to supporting a mix of MapReduce jobs. We analyze 6-month traces from two large clusters at Yahoo and identify interesting properties of the workloads. We present a novel model for capturing popularity and short-term temporal correlations in object request streams, and show how unsupervised statistical clustering can be used to enable autonomic type-aware workload generation that is suitable for emerging workloads. We extend this model to include other relevant properties of storage systems (file creation and deletion, pre-existing namespaces and hierarchical namespaces) and use the extended model to implement MimesisBench, a realistic namespace metadata benchmark for next-generation storage systems. Finally, we demonstrate the usefulness of MimesisBench through a study of the scalability and performance of the Hadoop Distributed File System name node.
- Graduation Semester
- 2014-05
- Permalink
- http://hdl.handle.net/2142/49497
- Copyright and License Information
- Copyright 2014 Cristina Abad
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…