A Holistic Paradigm for Large Scale Schema Matching

He, Bin

A Holistic Paradigm for Large Scale Schema Matching

He, Bin

This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.

Permalink

https://hdl.handle.net/2142/81733

Description

Title: A Holistic Paradigm for Large Scale Schema Matching
Author(s): He, Bin
Issue Date: 2006
Doctoral Committee Chair(s): Chang, Kevin Chen-Chuan
Department of Study: Computer Science
Discipline: Computer Science
Degree Granting Institution: University of Illinois at Urbana-Champaign
Degree Name: Ph.D.
Degree Level: Dissertation
Keyword(s): Computer Science
Language: eng
Abstract: "Schema matching is a critical problem for integrating heterogeneous information sources. Traditionally, the problem of matching multiple schemas has essentially relied on finding pairwise attribute correspondences in isolation. In contrast, this thesis proposes a new matching paradigm, holistic schema matching, to match many schemas at the same time and find all matchings at once. By handling a set of schemas together, we can explore their context information that reflects the semantic correspondences among attributes. Such information is not available when schemas are matched only in pairs. As the realizations of holistic schema matching, we develop two approaches in sequence. To begin with, we develop the MGS framework, which finds simple 1:1 matchings by viewing schema matching as hidden model discovery. Then, to deal with complex matchings, we further develop the DCM framework by abstracting schema matching as correlation mining. Further, to automate the entire matching process, we incorporate the DCM framework with automatically extracted interfaces and find that the inevitable errors in automatic interface extraction may significantly affect the matching result. To make the DCM framework robust against such ""noisy"" schemas, we propose to integrate it with an ensemble approach by randomizing the schema data into multiple DCM matchers and aggregating their ranked results by taking majority voting. Last, as our matching algorithms require large-scale schemas in the same domain (e.g., Books and Airfares) as input, we develop an object-focused crawler for effectively collecting query interfaces and a model-differentiation based clustering approach to clustering schemas into their domain hierarchy."
Graduation Semester: 2006
Type of Resource: text
Permalink: http://hdl.handle.net/2142/81733

A Holistic Paradigm for Large Scale Schema Matching

He, Bin

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

A Holistic Paradigm for Large Scale Schema Matching

He, Bin

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

Log In