Withdraw
Loading…
Information fusion in taxonomic descriptions
Wei, Qin
Loading…
Permalink
https://hdl.handle.net/2142/26070
Description
- Title
- Information fusion in taxonomic descriptions
- Author(s)
- Wei, Qin
- Issue Date
- 2011-08-25T22:11:45Z
- Director of Research (if dissertation) or Advisor (if thesis)
- Heidorn, P. Bryan
- Doctoral Committee Chair(s)
- Heidorn, P. Bryan
- Committee Member(s)
- Smith, Linda C.
- Blake, Catherine
- Macklin, James
- Department of Study
- Library & Information Science
- Discipline
- Library & Information Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Information fusion
- Information extraction
- Biodiversity
- Abstract
- Providing a single access point to an information system from multiple sources is helpful in many fields. As a case study, this research investigates the potential of applying information fusion techniques in biodiversity area since researchers in this domain desperately need information from different sources to support decision making on tasks like biological identification. Furthermore, there are massive collections in this area and the descriptive materials on the same species (object) are scattered in different places. It is not easy to manually collect information to form a broader and integrated one. As one of the most important descriptive materials in this field, floras are selected as the target of this research. This research tests a hypothesis concerning the organization of text and the constancy of fact-based information in text. It is observed that individual descriptions may not contain sufficient information to differentiate the target species from others, and different information sources might contain not only overlap information but also complementary information that is helpful. We also observe non-trivial complementary information could also be from different-level descriptions [family, genus, or species level] from the same source. By using the sample dataset from Flora of North America (FNA) and Flora of China (FOC), we found that about 50% information could only be found in single source and another 25% complementary information could be identified by fusion. And the most importantly, confliction information could only be detected by direct comparison. The question is how could we fuse the records in an automatic or semi-automatic manner, so that each resulting record provides a broader while non-redundant description of each species? The proposed system demonstrates the feasibility with currently available techniques. The prototype system contains 4 modules: Text segmentation and Taxonomic Name Identification, Organ-level and Sub-organ level Information Extraction, Relationship Identification, and Information fusion. By using the sample descriptions from Flora of North America and Flora of China, we demonstrate that the method gain promising fusion result based on Cross-Description Relationships. With the evaluation results, we identified the key factors contribute to the performance of fusion. Some methods that might lead to further improvement on fusion performances are discussed. This study also demonstrates that to a certain extent, this fusion approach is generalizable. The generalizability of this fusion approach is a challenging problem due to the typical domain- and task- oriented nature of the fusion methods. We identified the challenges while applying the approach to different data set.
- Graduation Semester
- 2011-08
- Permalink
- http://hdl.handle.net/2142/26070
- Copyright and License Information
- Copyright 2011 Qin Wei
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Information Sciences
Dissertations and theses from the School of Information SciencesManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…