Withdraw
Loading…
User-guided dynamic topic discovery in large texts
Venkat Ramanan, Karthik
This item's files can only be accessed by the Administrator group.
Permalink
https://hdl.handle.net/2142/120590
Description
- Title
- User-guided dynamic topic discovery in large texts
- Author(s)
- Venkat Ramanan, Karthik
- Issue Date
- 2023-05-04
- Director of Research (if dissertation) or Advisor (if thesis)
- Han, Jiawei
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Information Retrieval
- Question Answering
- Natural Language Processing
- Data Mining
- Pretrained Language Models
- Topic Modeling
- Abstract
- Dynamic topic models (DTMs) play a crucial role in generating insights from large timestamped corpora of text by capturing the evolution of topics over time. Despite their popularity, existing DTMs are fully unsupervised, resulting in generated topic evolutions that often do not cater to a user’s needs. Additionally, the topic evolutions produced by DTMs tend to contain generic terms that do not accurately represent their designated time steps. This is particularly problematic as DTMs are frequently employed for analyzing the evolution of specific topics within a corpus. To address these challenges, we propose ReGenT, a framework for Dynamic, Discriminative Topic Discovery. This task aims to discover topic evolutions from temporal corpora that align with a set of user-provided category names while uniquely capturing topics at each time step. We accomplish this by (1) utilizing a retrieval-QA framework to retrieve relevant words for seeds with high granularity, (2) automatically generating and ranking strong questions to probe future words to expand our initial word set, (3) ensuring that the mined words are distinctly popular at a given time, and (4) iteratively refining our word list through ensemble ranking. We conduct experiments on two diverse datasets and demonstrate that ReGenT achieves state-of-the-art performance through extensive evaluations.
- Graduation Semester
- 2023-05
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2023 Karthik Venkat Ramanan
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…