Withdraw
Loading…
Time-varying networks estimation and Chinese words segmentation
Shu, Xinxin
Loading…
Permalink
https://hdl.handle.net/2142/50539
Description
- Title
- Time-varying networks estimation and Chinese words segmentation
- Author(s)
- Shu, Xinxin
- Issue Date
- 2014-09-16
- Director of Research (if dissertation) or Advisor (if thesis)
- Qu, Annie
- Doctoral Committee Chair(s)
- Qu, Annie
- Committee Member(s)
- Simpson, Douglas G.
- Douglas, Jeffrey A.
- Chen, Xiaohui
- Department of Study
- Statistics
- Discipline
- Statistics
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- dynamic networks
- proximal gradient method
- varying-coefficient model
- Language Processing
- words segmentation
- Abstract
- This thesis contains two research areas including time-varying networks estimation and Chinese words segmentation. Chapter 1 introduces the background of the time-varying networks and the structure of Chinese language, followed by the motivations and goals for the research work. In many biomedical and social science studies, it is important to identify and predict the dynamic changes of associations among network data over time. However, inadequate literature addresses the estimation of time-varying networks mainly because of extremely large volume of time-varying network data, leading to the computational difficulty. In Chapter 2, we propose a varying-coefficient model to incorporate time-varying network data, and impose a piecewise-penalty function to capture local features of the network associations. The advantages of the proposed approach are that it is nonparametric and therefore flexible in modeling dynamic changes of association for network data problems, and capable of identifying the time regions when dynamic changes of associations occur. To achieve local sparsity of network estimation, we implement a group penalization strategy involving overlapping parameters among different groups. We also develop a fast algorithm, based on the smoothing proximal gradient method, which is computationally efficient and accurate. We illustrate the proposed method through simulation studies and children's attention deficit hyperactivity disorder fMRI data, and show that the proposed method and algorithm efficiently recover dynamic network changes over time. The digital information has become an essential part of modern life, from scientific research, entertainment business, product marketing to national security protection. So developing fast automatic process of information extraction becomes extremely demanding. Chinese language is the second popular language among all internet users but is still severely under-studied, mainly due to the challenge of its ambiguity nature. In Chapter 3, we propose a new method for word segmentation in Chinese language processing. The Chinese language is the second most popular language among all internet users, but it is still not well-studied. Segmentation becomes crucial for Chinese language processing, since it is the first step to develop a fast automatic process of information extraction. One major challenge is that the Chinese language is highly context-dependent, and is very different from English. We propose a machine-learning model with computationally feasible loss functions which utilize linguistically-embedded features. The proposed method is investigated through the Peking university corpus Chinese documents. Our numerical study shows that the proposed method performs better than existing top competitive performers.
- Graduation Semester
- 2014-08
- Permalink
- http://hdl.handle.net/2142/50539
- Copyright and License Information
- Copyright 2014 Xinxin Shu
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…