Accelerating queries for structured and unstructured data
Jin, Tengjun
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/124567
Description
Title
Accelerating queries for structured and unstructured data
Author(s)
Jin, Tengjun
Issue Date
2024-04-30
Director of Research (if dissertation) or Advisor (if thesis)
Kang, Daniel
Department of Study
Electrical & Computer Eng
Discipline
Electrical & Computer Engr
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
Database Systems
Machine Learning
Approximate Query Processing
Abstract
Data analytics is important for making data-driven decisions. As data volumes expand, the efficiency and cost of executing queries become critical concerns for analysts. Traditionally, analytics systems have prioritized structured data. Approximate Query Processing (AQP) systems, which provide faster aggregation queries by delivering approximate results, have been developed to enhance efficiency. However, they are limited used in real-application due to compatibility issues with popular databases and restrictions on the types of queries they can handle. To overcome these limitations, we have designed an innovative AQP system that functions as middleware. This system uses online sampling techniques to accelerate aggregation queries and can meet user-specified error targets.
With advancements in machine learning (ML), analysts are increasingly interested in analyzing unstructured data (videos, images, text, and audio) to extract semantic information. Current analytics systems typically integrate ML models through user-defined functions (UDFs). These UDFs can be difficult to optimize and require application users to write complex, nested table expressions. To address these challenges, we introduce a new data model, AIDM, enabling users to query ML model outputs as standard SQL tables, through virtual columns and virtual tables. We implement AIDM, as well as novel optimizations for accelerating both approximate and exact queries in AIDB.
Our evaluations show that the AQP system can provide speedups of up to 87x and AIDB can reduce the number of ML model invocations by up to 98%.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.