Withdraw
Loading…
Time series modeling of text data
Dey, Priyanka
Loading…
Permalink
https://hdl.handle.net/2142/120088
Description
- Title
- Time series modeling of text data
- Author(s)
- Dey, Priyanka
- Issue Date
- 2023-04-24
- Director of Research (if dissertation) or Advisor (if thesis)
- Zhai, ChengXiang
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- time series modeling
- text data
- word representations
- interpretability
- Abstract
- The success of machine learning algorithms in recent years has further accelerated the race to develop language models that can accurately and precisely represent text for a wide range of downstream tasks. Unfortunately, the development of these large and powerful models has also led to an increase in model architectures and complexities, thus sometimes making it extremely difficult to understand and interpret these model results. In this research, we present a new methodology for representing text using time series models, namely ARIMA to represent word embeddings as a set of regression equations. Through our experiments and analysis, we show that these representations are often successful in learning language across various domains e.g. sports, politics, and science. We further show that our representations can successfully be used to foster development of downstream applications such as next word prediction, salient dimension lattice generation, and article title generation. With the surge of the field of natural language processing, building models that can accurately represent and generate language has been a major focus for research. Models such as BERT, GPT-3, Chat-GPT are all examples of these large language models that can be used for a wide range of applications with a remarkable amount of precision and accuracy. However, many critique that such models are essentially black boxes. Our work is motivated by the challenging nature of these models to develop a simple yet still effective means of representing text through time series models.
- Graduation Semester
- 2023-05
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2023 Priyanka Dey
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…