Withdraw
Loading…
Automatically identifying facet roles from comparative structures to support biomedical text summarization
Lucic, Ana
Loading…
Permalink
https://hdl.handle.net/2142/98087
Description
- Title
- Automatically identifying facet roles from comparative structures to support biomedical text summarization
- Author(s)
- Lucic, Ana
- Issue Date
- 2017-06-26
- Director of Research (if dissertation) or Advisor (if thesis)
- Blake, Catherine Lesley
- Doctoral Committee Chair(s)
- Blake, Catherine Lesley
- Committee Member(s)
- Girju, Corina Roxana
- Efron, Miles
- Renear, Allen H.
- Downie, J. Stephen
- Department of Study
- Information Sciences
- Discipline
- Library & Information Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Comparison sentences
- Natural language processing
- Text mining
- Text summarization
- Information extraction
- Abstract
- Within the context of biomedical scholarly articles, comparison sentences represent a rhetorical structure commonly used to communicate findings. More generally, comparison sentences are rich with information about how the properties of one or more entities relate one another. So far, in the biomedical domain, the emphasis has been on recognizing comparative sentences in the text. This dissertation goes beyond sentence-level recognition and aims to automate the identification of the integral parts of a comparison sentence which are called comparative facets and include: compared entities, the basis or the endpoint of comparison as well as the result or the relationship that binds the entities and the basis. Only the sentences that contain each of the four facets are of interest in this thesis. With respect to the first compared entity, the system achieves an average F1 on a random sample of short (between 11 and 21 words long) sentences of 0.65; medium (between 22 and <= 28 words) sentences 0.70; long (between 29 and <=36 words) sentences 0.60 and very long (more than 36 words), 0.60. With respect to the basis of comparison prediction (the endpoint), the average F1 measure ranged from 0.66 on short, 0.57 on medium, 0.56 on long, and 0.50 on very long sentences. The average F1 achieved with respect to the second entity compared ranged from 0.91 on short, 0.85 on medium, 0.81 on long and 0.72 on very long sentences. In the area of semantic relation identification, the performance achieved was also sensitive to sentence length: the average F1 measure on short sentences was 0.80; it was 0.71, 0.56, and 0.51 on medium, long, and very long sentences respectively. Thus, the methods developed in this dissertation work better on sentences that are shorter (<= 28 words) and on those that do not contain multiple claims or disjunctive conjunctions. When applied to a previously unseen collection of breast cancer articles, the performance achieved with respect to the identification of compared entities and the endpoint was comparable to the results achieved on the collection that was used for building and testing the models. This result is promising with respect to the potential of this model being applied on other collections of scholarly articles in the biomedical sciences.
- Graduation Semester
- 2017-08
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/98087
- Copyright and License Information
- Copyright 2017 Ana Lucic
Owning Collections
Dissertations and Theses - Information Sciences
Dissertations and theses from the School of Information SciencesGraduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…