Withdraw
Loading…
Computational measures of linguistic variation: a study of Arabic varieties
Abunasser, Mahmoud
Loading…
Permalink
https://hdl.handle.net/2142/78346
Description
- Title
- Computational measures of linguistic variation: a study of Arabic varieties
- Author(s)
- Abunasser, Mahmoud
- Issue Date
- 2015-04-08
- Director of Research (if dissertation) or Advisor (if thesis)
- Benmamoun, Elabbas
- Doctoral Committee Chair(s)
- Benmamoun, Elabbas
- Committee Member(s)
- Hasegawa-Johnson, Mark A.
- Shosted, Ryan
- Mustafawi, Eiman
- Department of Study
- Linguistics
- Discipline
- Linguistics
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Measures of linguistic variation
- Lexical variation
- Pronunciation variation
- Linguistic distance
- Lexical distance
- Pronunciation distance
- Arabic linguistics
- Modern Standard Arabic
- Gulf Arabic
- Levantine Arabic
- Egyptian Arabic
- Moroccan Arabic
- Measures of mutual intelligibility
- Arabic dialectology
- Mathematical representation of sound
- Mathematical representation of phones
- Mathematical representation of Phonemes
- Asymmetric measure of linguistic variation
- Asymmetric linguistic distance
- Abstract
- This thesis introduces and discusses a new methodology for measuring the variation between linguistic varieties. I compare five Arabic varieties – Modern Standard Arabic MSA, Gulf Arabic GA, Levantine Arabic LA, Egyptian Arabic EA, and Moroccan Arabic MA – considering both lexical and pronunciation variation. I introduce the idea of measuring the amount of linguistic variation asymmetrically; the amount of linguistics variation between a speaker of variety A and a hearer of variety B is not necessarily equal to the amount of linguistic variation between a speaker of variety B and a hearer of variety A. I propose a new mathematically based computational representation of sound that enables the incorporation of phonetic features and articulatory gestures in measuring the amount of pronunciation variation. I also implement an optimization technique to assign weights and parameters to the phonetic features and articulatory gestures for the proposed representation of sound. The developed methodology, tools and techniques lead to a better understanding of the structure of language and have implications for both theoretical linguistics and applied work in natural language processing NLP, it both provides a computational technique to assess the plausibility of defining the components of sound and opens a new venue to the possibility of utilizing a representation of sound that is phonetically motivated and computationally applicable to NLP problems. This research could potentially yield insights into the issues of mutual intelligibility between Arabic varieties and dialect identification. Measuring lexical and pronunciation variation is based on native speaker elicitations of the Swadesh list for the local varieties of Arabic; MSA is represented by data from dictionaries. The data collection procedure allows the participants to provide more than one translation. I also provide a context sentence for all lexical items to rule out cases of ambiguity. The amount of lexical variation is measured at two levels of representation: the word level and the phonemic level. At the word level, the amount of linguistic variation is based on whether the words share a linguistic origin. The phonemic level, using IPA transcription of words, looks at more details in measuring the lexical variation. The amount of pronunciation variation is measured at three levels. The first and most abstract level is the phonemic level. The second incorporates the mathematical representation of sound; which encodes phonetic features and articulatory gestures. The third allows the vowels to be represented non-categorically based on the values of the first and second formant frequencies, MSA is not included at this level. The results of the measures of linguistic variation developed in this study confirm two observations about the communication between speakers of the Arabic varieties and provide an answer for the frequently asked question about the closeness of the Arabic varieties to each other. The first observation is that MA seems to be relatively distant from the other local varieties (GA, EA, and LA) than those varieties are from each other, which relates to the geographical distances between those varieties. The second observation is the asymmetric pattern of intelligibility in the communication of EA speakers with the members of the other local varieties; GA, LA, and MA speakers seem to understand EA speakers better than the EA speakers understand them. This asymmetric pattern of intelligibility is reflected by the variation metrics developed in this research. As for the closeness of the local varieties to MSA, GA and – to some extent – LA seem to be the closest, followed EA, and MA is the farthest. In addition, EA seems to be closer to MA than both LA and GA. Moreover, EA speakers are closer to LA hearers than GA hearers. On the other hand, GA speakers are closer to LA hearers than EA hearers. Finally, the last measure, that of pronunciation variation, situates LA speakers closer GA hearers than EA hearers.
- Graduation Semester
- 2015-5
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/78346
- Copyright and License Information
- Copyright 2015 Mahmoud Abu Nasser
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…