Withdraw
Loading…
Opinion Topic, Holder and Polarity in texts: exploration and automatic identification from cross-lingual data
Kim, Kyoung-Young
Loading…
Permalink
https://hdl.handle.net/2142/24482
Description
- Title
- Opinion Topic, Holder and Polarity in texts: exploration and automatic identification from cross-lingual data
- Author(s)
- Kim, Kyoung-Young
- Issue Date
- 2011-05-25T14:26:17Z
- Director of Research (if dissertation) or Advisor (if thesis)
- Sproat, Richard W.
- Doctoral Committee Chair(s)
- Girju, Roxana
- Committee Member(s)
- Sproat, Richard W.
- Lasersohn, Peter N.
- Zhai, ChengXiang
- Department of Study
- Linguistics
- Discipline
- Linguistics
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Opinion mining
- Sentiment analysis
- English and Korean
- Opinion extraction
- Abstract
- People express their opinions in various ways in different domains. With the growing interest in what other people think, mining opinions in texts has been the focus of attention for researchers in many different fields. Also, with the rapid development of technology and the internet, more and more multilingual and multicultural information has become available on the web. The objective of the present dissertation is exploring and automatically extracting opinions from multilingual corpora. In pursuing this objective, a bilingual opinion-annotated corpus was constructed focusing on detailed opinion factors with editorial texts. Annotated opinion factors include the holder of an opinion (Holder) and the topic of an opinion with its polarity (Positive Topic, Negative Topic). Factors used to express opinions as well as opinions across languages were investigated with the annotated corpus. The main contribution of this dissertation is the proposal of a multilingual sentiment analysis system for identifying opinion factors using a novel method that explores the linguistic structures used to express opinions. Without using pre-labeled opinion words, this multilingual sentiment analysis system directly identifies opinion factors using syntactic analysis, predicate-argument structure and pragmatic analysis. In the place of pre-labeled opinion words for each language, a clustered lexicon was constructed from bilingual dictionaries. Lexical features crucial for identifying the polarity were learned automatically. In addition to the lexical features, syntactic, morphological and contextual features were used in the learning algorithm. The syntactic structure of the sentence as well as predicate-argument structures extracted from the Propbank database were investigated and used to assign appropriate features to the target chunk. The experimental results show that the proposed system is significantly more successful than a baseline system. Experiments focusing on each novel method verify that both the clustered lexical dictionary and incorporating more linguistic structures benefit the accuracy of opinion factor extraction. The proposed system was also tested with an existing English monolingual corpus (MPQA corpus) composed of news articles, and yielded consistent results with the annotated corpus. With the experimental set-up of multilingual analysis, the way that opinions are expressed across languages was investigated and utilized to improve the results of the analysis. Experiments with cross-lingual features extracted from parallel sentences show even more improved results, which suggests cross-lingual reinforcement in identifying opinion factors with the proposed system.
- Graduation Semester
- 2011-05
- Permalink
- http://hdl.handle.net/2142/24482
- Copyright and License Information
- Copyright 2011 Kyoung-Young Kim
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…