Withdraw
Loading…
Idiomatic sentence generation and paraphrasing
Zhou, Jianing
Loading…
Permalink
https://hdl.handle.net/2142/110532
Description
- Title
- Idiomatic sentence generation and paraphrasing
- Author(s)
- Zhou, Jianing
- Issue Date
- 2021-04-23
- Director of Research (if dissertation) or Advisor (if thesis)
- Bhat, Suma
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- natural language processing
- idiom processing
- Abstract
- Idiomatic expressions (IE) play an important role in natural language, and have long been a “pain in the neck” for NLP systems. Despite this, text generation tasks related to IEs remain largely under-explored. In this study, we propose two new tasks of idiomatic sentence generation and paraphrasing to fill this research gap. We introduce a curated dataset of 823 IEs, and a parallel corpus with sentences containing them and the same sentences where the IEs were replaced by their literal paraphrases as the primary resource for our tasks. We benchmark existing deep learning models, which have state-of-the-art performance on related tasks using automated and manual evaluation with our dataset to inspire further research on our proposed tasks. By establishing baseline models, we pave the way for more comprehensive and accurate modeling of IEs, both for generation and paraphrasing. Inspired by psycholinguistic theories of idiom use in one’s native language, we also propose a novel approach for these tasks, which retrieves the appropriate idiom for a given literal sentence, extracts the span of the sentence to be replaced by the idiom, and generates the idiomatic sentence by using a large pre-trained language model to combine the retrieved idiom and the remainder of the sentence. For idiomatic sentence paraphrasing, the definition of the idiom in the given idiomatic sentence is first retrieved. Then the idiom in the sentence is extracted and finally, the literal counterpart is generated by a large pre-trained language model. Experiments on a novel dataset created for these tasks show that our model is able to work effectively. Furthermore, automatic and human evaluations show that for these tasks, the proposed model outperforms a series of competitive baseline models for text generation. Being able to generate literal counterparts of high quality, our method for idiomatic sentence paraphrase is also used for constructing a larger corpus with the help of MAGPIE dataset. This enlarged corpus also helps to improve the performance of different models on idiomatic sentence generation.
- Graduation Semester
- 2021-05
- Type of Resource
- Thesis
- Permalink
- http://hdl.handle.net/2142/110532
- Copyright and License Information
- Copyright 2021 Jianing Zhou
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…