Withdraw
Loading…
Template B+ trees: an index scheme for fast data streams with distributed append-only stores
Mazumdar, Parijat
Loading…
Permalink
https://hdl.handle.net/2142/90645
Description
- Title
- Template B+ trees: an index scheme for fast data streams with distributed append-only stores
- Author(s)
- Mazumdar, Parijat
- Issue Date
- 2016-04-26
- Director of Research (if dissertation) or Advisor (if thesis)
- Winslett, Marianne
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- B+ trees
- Template B+ Trees
- distributed
- append-only datastore
- Abstract
- Distributed systems are now commonly used to manage massive data flooding from the physical world, such as user-generated content from online social media and communication records from mobile phones. The new generation of distributed data management systems, such as HBase, Cassandra and Riak, are designed to perform queries and tuple insertions only. Other database operations such as deletions and updates are simulated by appending the keys associated with the target tuples to operation logs. Such an append-only store architecture maximizes the processing throughput on incoming data, but potentially incurs higher costs during query processing, because additional computation is needed to generate consistent snapshots of the database. Indexing is the key to enable efficient query processing by fast data retrieval and aggregation under such a system architecture. This thesis presents a new in-memory indexing scheme for distributed append-only stores. Our new scheme utilizes traditional index structures based on B+ trees and their variants to create an efficient in-memory template-based tree without the overhead of expensive node splits. We also propose the use of optimized domain partitioning and multi-thread insertion techniques to exploit the advantages of the template B+ tree structure. Our empirical evaluations show that insertion throughput is five times higher with template B+ trees than with HBase, on a variety of real and synthetic workloads.
- Graduation Semester
- 2016-05
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/90645
- Copyright and License Information
- Copyright 2016 Parijat Mazumdar
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…