Withdraw
Loading…
Estimation problems in speech and natural language
Bhat, Suma P.
Loading…
Permalink
https://hdl.handle.net/2142/16031
Description
- Title
- Estimation problems in speech and natural language
- Author(s)
- Bhat, Suma P.
- Issue Date
- 2010-05-19T18:32:47Z
- Director of Research (if dissertation) or Advisor (if thesis)
- Sproat, Richard W.
- Doctoral Committee Chair(s)
- Sproat, Richard W.
- Committee Member(s)
- Church, Kenneth W.
- Hasegawa-Johnson, Mark A.
- Roth, Dan
- Levinson, Stephen E.
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Vocabulary Size Estimation
- Estimation of the number of unseen events
- Automatic Fluency Assessment
- Variable Selection
- Predictors of Oral Fluency
- Click-Through Rate Prediction
- Abstract
- This dissertation is a study of two problems on estimation in the areas of natural language and speech. In the first problem we revisit the classical problem of estimating the size of unseen elements which we study in the context of a regime that is characterized by a large number of rare events, natural language being one. We propose an estimator of the size of the vocabulary of the underlying population that generates an observation and show that it has theoretical guarantees of optimal performance. Using natural language corpora from different languages we show that the performance of our estimator compares favorably with that of state-of-the-art estimators. In the second problem, we explore the effect of vocabulary size and temporal aspects of speech production on perceptions of second language fluency with the aim of designing objective methods of fluency assessment from spontaneous speech. We show that articulation rate, phonation-time ratio, mean length of silent pauses and the number of silent pauses per second are aspects of speech production that are well correlated with human assigned scores of fluency. The measures of lexical use that we found to correlate well with fluency scores were the total number of words spoken (word tokens), the number of different words uttered (word types) and the number of words spoken once ({\em hapax legomena}). With the goal of objective fluency assessment without the use of automatic speech recognition, we show the utility of measures of temporal aspects of speech production that were obtained from direct signal-level measurements. Their use in a logistic regression framework for predicting fluency scores showed high agreement with scores assigned by human raters. An interesting experiment was exploring the difference in automatic assessment based on random snippets of the spoken utterance and that based on the complete utterance. Although the differences are not seen to be statistically significant at the 1\% level, this opens avenues for further experimentation.
- Graduation Semester
- 2010-05
- Permalink
- http://hdl.handle.net/2142/16031
- Copyright and License Information
- Copyright 2010 Suma P. Bhat
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Electrical and Computer Engineering
Dissertations and Theses in Electrical and Computer EngineeringManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…