Withdraw
Loading…
Weaving Entities into Relations: From Page Retrieval to Relation Mining on the Web
Kelley, Joseph M.; Chang, Kevin Chen-Chuan; Cheng, Tao; Chuang, Shui-Lung; Davis, William
Loading…
Permalink
https://hdl.handle.net/2142/10968
Description
- Title
- Weaving Entities into Relations: From Page Retrieval to Relation Mining on the Web
- Author(s)
- Kelley, Joseph M.
- Chang, Kevin Chen-Chuan
- Cheng, Tao
- Chuang, Shui-Lung
- Davis, William
- Issue Date
- 2004-11
- Keyword(s)
- data mining
- Abstract
- With its sheer amount of information, the Web is clearly an important frontier for data mining. While Web mining must start with content on the Web, there is no effective ``search-based'' mechanism to help sifting through the information on the Web. Our goal is to provide a such online search-based facility for supporting query primitives, upon which Web mining applications can be built. As a first step, this paper aims at entity-relation discovery, or E-R discovery, as a useful function-- to weave scattered entities on the Web into coherent relations. To begin with, as our proposal, we formalize the concept of E-R discovery. Further, to realize E-R discovery, as our main thesis, we abstract tuple ranking-- the essential challenge of E-R discovery-- as pattern-based cooccurrence analysis. Finally, as our key insight, we observe that such relation mining shares the same core functions as traditional page-retrieval systems, which enables us to build the new E-R discovery upon today's search engines, almost for free. We report our system prototype and testbed, WISDM-ER, with real Web corpus. Our case studies have demonstrated a high promise, achieving 83%-91% accuracy for real benchmark queries-- and thus the real possibilities of enabling ad-hoc Web mining tasks with online E-R discovery.
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/10968
- Copyright and License Information
- You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format, BUT this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the University of Illinois at Urbana-Champaign Computer Science Department under terms that include this permission. All other rights are reserved by the author(s).
Owning Collections
Manage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…