Computational measures of linguistic variation: a study of Arabic varieties

Abunasser, Mahmoud

Computational measures of linguistic variation: a study of Arabic varieties

Abunasser, Mahmoud

Permalink

https://hdl.handle.net/2142/78346

Description

Title

Computational measures of linguistic variation: a study of Arabic varieties

Author(s)

Abunasser, Mahmoud

Issue Date

2015-04-08

Director of Research (if dissertation) or Advisor (if thesis)

Benmamoun, Elabbas

Doctoral Committee Chair(s)

Benmamoun, Elabbas

Committee Member(s)

Hasegawa-Johnson, Mark A.
Shosted, Ryan
Mustafawi, Eiman

Department of Study

Linguistics

Discipline

Linguistics

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Date of Ingest

2015-07-22T22:16:27Z

Keyword(s)

Measures of linguistic variation
Lexical variation
Pronunciation variation
Linguistic distance
Lexical distance
Pronunciation distance
Arabic linguistics
Modern Standard Arabic
Gulf Arabic
Levantine Arabic
Egyptian Arabic
Moroccan Arabic
Measures of mutual intelligibility
Arabic dialectology
Mathematical representation of sound
Mathematical representation of phones
Mathematical representation of Phonemes
Asymmetric measure of linguistic variation
Asymmetric linguistic distance

Abstract

This thesis introduces and discusses a new methodology for measuring the variation between linguistic varieties. I compare five Arabic varieties – Modern Standard Arabic MSA, Gulf Arabic GA, Levantine Arabic LA, Egyptian Arabic EA, and Moroccan Arabic MA – considering both lexical and pronunciation variation. I introduce the idea of measuring the amount of linguistic variation asymmetrically; the amount of linguistics variation between a speaker of variety A and a hearer of variety B is not necessarily equal to the amount of linguistic variation between a speaker of variety B and a hearer of variety A. I propose a new mathematically based computational representation of sound that enables the incorporation of phonetic features and articulatory gestures in measuring the amount of pronunciation variation. I also implement an optimization technique to assign weights and parameters to the phonetic features and articulatory gestures for the proposed representation of sound. The developed methodology, tools and techniques lead to a better understanding of the structure of language and have implications for both theoretical linguistics and applied work in natural language processing NLP, it both provides a computational technique to assess the plausibility of defining the components of sound and opens a new venue to the possibility of utilizing a representation of sound that is phonetically motivated and computationally applicable to NLP problems. This research could potentially yield insights into the issues of mutual intelligibility between Arabic varieties and dialect identification. Measuring lexical and pronunciation variation is based on native speaker elicitations of the Swadesh list for the local varieties of Arabic; MSA is represented by data from dictionaries. The data collection procedure allows the participants to provide more than one translation. I also provide a context sentence for all lexical items to rule out cases of ambiguity. The amount of lexical variation is measured at two levels of representation: the word level and the phonemic level. At the word level, the amount of linguistic variation is based on whether the words share a linguistic origin. The phonemic level, using IPA transcription of words, looks at more details in measuring the lexical variation. The amount of pronunciation variation is measured at three levels. The first and most abstract level is the phonemic level. The second incorporates the mathematical representation of sound; which encodes phonetic features and articulatory gestures. The third allows the vowels to be represented non-categorically based on the values of the first and second formant frequencies, MSA is not included at this level. The results of the measures of linguistic variation developed in this study confirm two observations about the communication between speakers of the Arabic varieties and provide an answer for the frequently asked question about the closeness of the Arabic varieties to each other. The first observation is that MA seems to be relatively distant from the other local varieties (GA, EA, and LA) than those varieties are from each other, which relates to the geographical distances between those varieties. The second observation is the asymmetric pattern of intelligibility in the communication of EA speakers with the members of the other local varieties; GA, LA, and MA speakers seem to understand EA speakers better than the EA speakers understand them. This asymmetric pattern of intelligibility is reflected by the variation metrics developed in this research. As for the closeness of the local varieties to MSA, GA and – to some extent – LA seem to be the closest, followed EA, and MA is the farthest. In addition, EA seems to be closer to MA than both LA and GA. Moreover, EA speakers are closer to LA hearers than GA hearers. On the other hand, GA speakers are closer to LA hearers than EA hearers. Finally, the last measure, that of pronunciation variation, situates LA speakers closer GA hearers than EA hearers.

Graduation Semester

2015-5

Type of Resource

text

Permalink

http://hdl.handle.net/2142/78346

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Computational measures of linguistic variation: a study of Arabic varieties

Abunasser, Mahmoud

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Linguistics

Log In