Large Scale Information Integration on the Web: Finding, Understanding and Querying Web Databases
Zhang, Zhen
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/81771
Description
Title
Large Scale Information Integration on the Web: Finding, Understanding and Querying Web Databases
Author(s)
Zhang, Zhen
Issue Date
2007
Doctoral Committee Chair(s)
Chang, Kevin C.
Department of Study
Computer Science
Discipline
Computer Science
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
Computer Science
Language
eng
Abstract
"The Web has been rapidly ""deepened"" by myriad searchable databases online, where data are hidden behind query interfaces. Guarding data behind there, such query interfaces are the ""entrances"" or ""doors"" to the deep Web. To open this door to the deep Web, we have been building the MetaQuerier system---for both exploring (to find) and integrating (to query) databases on the Web through their query interfaces. To find Web databases, we need to provide search functionalities that dynamically discover databases relevant to user's information needs. To query those Web databases, we need to ""understand"" what a query interface says---i.e., what query capabilities a source supports through its interface, in terms of specifiable conditions. Further, to help users query ""alternative"" sources, we need to mediate heterogeneous query capabilities across different sources discovered on-the-fly. Finally, to process queries submitted to a database, we need to design efficient query processing techniques. To address those challenges, this thesis presents several key components in MetaQuerier system: First, a search facility searches for useful databases by their schemas; Second, form extractor extracts query capabilities of databases by applying a best-effort parsing approach based on hidden syntax; Third, form assistant translates queries across pairs of interfaces on-the-fly by deploying a light-weight, domain-based translation framework. Fourth, OPT* framework processes ranked queries by a k constraint optimization problem. We evaluate our techniques upon real databases on the Web. The experiment results show the promise of our system."
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.