Withdraw
Loading…
A Holistic Paradigm for Large Scale Schema Matching
He, Bin
Loading…
Permalink
https://hdl.handle.net/2142/11180
Description
- Title
- A Holistic Paradigm for Large Scale Schema Matching
- Author(s)
- He, Bin
- Issue Date
- 2006-06
- Keyword(s)
- computer science
- Abstract
- "Schema matching is a critical problem for integrating heterogeneous information sources. Traditionally, the problem of matching multiple schemas has essentially relied on finding pairwise attribute correspondences in isolation. In contrast, this thesis proposes a new matching paradigm, holistic schema matching, to match many schemas at the same time and find all matchings at once. By handling a set of schemas together, we can explore their context information that reflects the semantic correspondences among attributes. Such information is not available when schemas are matched only in pairs. As the realizations of holistic schema matching, we develop two approaches in sequence. To begin with, we develop the MGS framework, which finds simple 1:1 matchings by viewing schema matching as hidden model discovery. Then, to deal with complex matchings, we further develop the DCM framework by abstracting schema matching as correlation mining. Further, to automate the entire matching process, we incorporate the DCM framework with automatically extracted interfaces and find that the inevitable errors in automatic interface extraction may significantly affect the matching result. To make the DCM framework robust against such ``noisy"" schemas, we propose to integrate it with an ensemble approach by randomizing the schema data into multiple DCM matchers and aggregating their ranked results by taking majority voting. Last, as our matching algorithms require large-scale schemas in the same domain (e.g., Books and Airfares) as input, we develop an object-focused crawler for effectively collecting query interfaces and a model-differentiation based clustering approach to clustering schemas into their domain hierarchy."
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/11180
- Copyright and License Information
- You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format, BUT this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the University of Illinois at Urbana-Champaign Computer Science Department under terms that include this permission. All other rights are reserved by the author(s).
Owning Collections
Manage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…