Withdraw
Loading…
Compiling contextualized lists of frequent vocabulary from user- supplied corpora using natural language processing techniques
Abdar, Omid
Loading…
Permalink
https://hdl.handle.net/2142/92955
Description
- Title
- Compiling contextualized lists of frequent vocabulary from user- supplied corpora using natural language processing techniques
- Author(s)
- Abdar, Omid
- Issue Date
- 2016-07-15
- Director of Research (if dissertation) or Advisor (if thesis)
- Sadler, Randall
- Committee Member(s)
- Schwartz, Lane
- Department of Study
- Linguistics
- Discipline
- Teaching of English Sec Lang
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.A.
- Degree Level
- Thesis
- Keyword(s)
- English for Specific Purposes, Vocabulary, Wordlists, Natural Language Processing
- Abstract
- Since there are thousands of words to learn in a new language, one common challenge for language learners and teachers is knowing which vocabulary items to prioritize over the others and, in general, setting vocabulary-learning goals. Within vocabulary teaching research, one approach has been to focus on lists of the most common vocabulary. West (1953) proposed a list of the 2000 most frequent word families in English that, it was argued, were most important for learners to master. Along the same lines, Coxhead (2000) offered a list of the most common words in academic English known as the Academic Word List (AWL). Arguing that AWL did not adequately reflect the learners’ specialized vocabulary needs, however, corpus linguists began to develop wordlists in specialized subject areas with an English for Specific Purposes (ESP) perspective for students in Business, Engineering, Medical, and Law majors and so on. A central theme in almost all previous endeavors to develop better wordlists has been the notion of 'representativeness'—the extent to which a wordlist 'represents' the language needs of leaners. In this study, it is proposed that an alternative way to maximize representativeness in a wordlist is to enable users to compile a wordlist from any text or corpus that is of interest to them and to provide the means of compiling a wordlist using that text. Using Natural Language Toolkit (NLTK), this study shows how a few Natural Language Processing (NLP) techniques may be used to compile a list of the most common words in the Europarl corpus along with retrieving example sentences from the corpus for each word. This new approach can have applications for both language leaners as well as for the purposes of preparing instructional materials in an ESP setting.
- Graduation Semester
- 2016-08
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/92955
- Copyright and License Information
- Copyright 2016 Omid Abdar
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…