Withdraw
Loading…
Large Scale Information Integration on the Web: Finding, Understanding and Querying Web Databases
Zhang, Zhen
Loading…
Permalink
https://hdl.handle.net/2142/11282
Description
- Title
- Large Scale Information Integration on the Web: Finding, Understanding and Querying Web Databases
- Author(s)
- Zhang, Zhen
- Issue Date
- 2006-12
- Keyword(s)
- computer science
- Abstract
- The Web has been rapidly ``deepened'' by myriad searchable databases online, where data are hidden behind query interfaces. Guarding data behind them, such query interfaces are the ``entrances'' or ``doors'' to the deep Web. To open this door to the deep Web, we have been building the MetaQuerier system-- for both exploring (to find) and integrating (to query) databases on the Web through their query interfaces. To find Web databases, we need to provide search functionalities that dynamically discover databases relevant to user's information needs. To query those Web databases, we need to ``understand'' what a query interface says-- i.e., what query capabilities a source supports through its interface, in terms of specifiable conditions. Further, to help users query ``alternative'' sources, we need to mediate heterogeneous query capabilities across different sources discovered on-the-fly. Finally, to process queries submitted to a database, we need to design efficient query processing techniques. To address those challenges, this thesis presents several key components in MetaQuerier system: First, a search facility searches for useful databases by their schemas; Second, form extractor extracts query capabilities of databases by applying a best-effort parsing approach based on hidden syntax; Third, form assistant translates queries across pairs of interfaces on-the-fly by deploying a light-weight, domain-based translation framework. Fourth, OPT* framework processes ranked queries by a k-constraint optimization problem. We evaluate our techniques upon real databases on the Web. The experiment results show the promise of our system.
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/11282
- Copyright and License Information
- You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format, BUT this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the University of Illinois at Urbana-Champaign Computer Science Department under terms that include this permission. All other rights are reserved by the author(s).
Owning Collections
Manage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…