Efficient Data Integration: Automation, Collaboration, and Relaxation
McCann, Robert Lee
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/81787
Description
Title
Efficient Data Integration: Automation, Collaboration, and Relaxation
Author(s)
McCann, Robert Lee
Issue Date
2007
Doctoral Committee Chair(s)
Doan, AnHai
Department of Study
Computer Science
Discipline
Computer Science
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
Computer Science
Language
eng
Abstract
While the previous two directions reduce integration costs by improving the performance of automatic tools (either by improvements to the tool itself, or by leveraging users to boost tool accuracy), the last direction explored in this thesis attacks data integration costs at their foundation---rigidity. The current data integration system model imposes a very rigid structure on its components and the data that is passed between components. For example, wrappers are responsible for extracting precise structured data, allowing traditional structured query processing techniques to compute the query result. However, my third direction explores our ability to relax these assumptions, thereby allowing us to answer queries without suffering unnecessary costs required in the traditional model (e.g., building full-fledged wrappers). In this thesis I investigate this idea within the context of supporting one-time, on-the-fly queries over distributed Web data. I develop and evaluate SLIC, a system that allows a user to quickly pose SQL queries over multiple sources (after only some minimal preprocessing), obtain initial results, then iterate with the system to get increasingly better results. The fundamental idea is to learn only as much structure as necessary to answer a given query. Extensive experiments on real-world domains show that for many practical queries SLIC is significantly faster than current methods, thus providing a promising first step toward a principled solution for lazy, on-the-fly integration of Web data, and hopefully sparking interest in our potential to remove some of the fundamental costs inherent in the traditional integration system model.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.