Withdraw
Loading…
Improving neural language models on low-resource creole languages
Schieferstein, Sarah
Loading…
Permalink
https://hdl.handle.net/2142/102512
Description
- Title
- Improving neural language models on low-resource creole languages
- Author(s)
- Schieferstein, Sarah
- Issue Date
- 2018-12-11
- Director of Research (if dissertation) or Advisor (if thesis)
- Hockenmaier, Julia
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Natural Language Processing
- Neural Networks
- Deep Learning
- Linguistics
- Creole Languages
- Creolistics
- Abstract
- When using neural models for NLP tasks, like language modelling, it is difficult to utilize a language with little data, also known as a low-resource language. Creole languages are frequently low-resource and as such it is difficult to train neural language models for them well. Creole languages are a special type of language that is widely thought of as having multiple parents and thus receiving a mix of evolutionary traits from all of them. One of a creole language’s parents is known as the lexifier, which gives the creole its lexicon, and the other parents are known as substrates, which possibly are thought to give the creole language its morphology and syntax. Creole languages are most lexically similar to their lexifier and most syntactically similar to otherwise unrelated creole languages. High lexical similarity to the lexifier is unsurprising because by definition lexifiers provide a creole’s lexicon, but high syntactic similarity to the other unrelated creole languages is not obvious and is explored in detail. We can use this information about creole languages’ unique genesis and typology to decrease the perplexity of neural language models on low-resource creole languages. We discovered that syntactically similar languages (especially other creole languages) can successfully transfer learned features during pretraining from a high-resource language to a low-resource creole language through a method called neural stacking. A method that normalized the vocabulary of a creole language to its lexifier also lowered perplexities of creole-language neural models.
- Graduation Semester
- 2018-12
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/102512
- Copyright and License Information
- Copyright 2018 Sarah Schieferstein
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…