Withdraw
Loading…
DataSpread: scaling spreadsheets using relational databases
Venkataraman, Vipul
Loading…
Permalink
https://hdl.handle.net/2142/97692
Description
- Title
- DataSpread: scaling spreadsheets using relational databases
- Author(s)
- Venkataraman, Vipul
- Issue Date
- 2017-04-12
- Director of Research (if dissertation) or Advisor (if thesis)
- Parameswaran, Aditya
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Spreadsheets
- Interactivity
- Data models
- Abstract
- Spreadsheet software is the tool of choice for ad-hoc tabular data management, manipulation, querying, and visualization with adoption by billions of users. However, spreadsheets are not scalable, unlike database systems. We develop DataSpread, a system that holistically unifies databases and spreadsheets with a goal to work with massive spreadsheets: DataSpread retains all of the advantages of spreadsheets, including ease of use, ad-hoc analysis and visualization capabilities, and a schema-free nature, while also adding the scalability and collaboration abilities of traditional relational databases. We design DataSpread with a spreadsheet front-end and a regular relational database back-end. To integrate spreadsheets and databases, in this thesis, we develop a storage and indexing engine for spreadsheet data. We first formalize and study the problem of representing and manipulating spreadsheet data within a relational database. We demonstrate that identifying the optimal representation is NP-Hard via a reduction from partitioning of rectangles; however, under certain reasonable assumptions, can be solved in PTIME. We develop a collection of mechanisms for representing spreadsheet data, and evaluate these representations on a workload of typical data manipulation operations. We augment our mechanisms with novel positionally-aware indexing structures that further improve performance. DataSpread can scale to billions of cells, returning results for common operations within seconds. Lastly, to motivate our research questions, we perform an extensive survey of spreadsheet use for ad-hoc tabular data management.
- Graduation Semester
- 2017-05
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/97692
- Copyright and License Information
- Copyright 2017 Vipul Venkataraman
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…