Withdraw
Loading…
Towards automated generation of open domain Wikipedia articles
Bao, Yunqian
Loading…
Permalink
https://hdl.handle.net/2142/121558
Description
- Title
- Towards automated generation of open domain Wikipedia articles
- Author(s)
- Bao, Yunqian
- Issue Date
- 2023-07-20
- Director of Research (if dissertation) or Advisor (if thesis)
- Zhai, ChengXiang
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Wikipedia articles
- automatic content generation
- open domain
- information systems
- ChatGPT
- Abstract
- Wikipedia has increasingly become an important resource for education and knowledge acquisition, and the automatic generation of Wikipedia articles could greatly improve the usefulness of Wikipedia. In this thesis, we aim at extending existing work to ad hoc open-domain Wikipedia article generation and address two limitations of existing work: the first limitation is that the generated articles either lack readability or lack citations for verifiability; the second limitation is that the generated articles only include a flattened list of sections without forming a multi-level hierarchy. We propose to extend Wikipedia generation to open domain by employing fine-grained entity typing for template generation (generating headings for the article). For template generation, we follow existing work and leverage the headings of similar articles to generate initial headings. But in our work, we identify the importance of entity type and formulated similar articles as articles about entities of the same type with the target entity. We employ ChatGPT for zero-shot Natural Language Processing (NLP) tasks. This further extends our approach to open domain setting. Also, we used ChatGPT for document summarization. By designing appropriate prompts, we are able to generate both readable and verifiable content (which addresses the first limitation mentioned earlier). To address the second limitation, inspired by prior work, we use state-of-the-art topic modeling solutions to enable the hierarchical structure of article content. The experiment results show that our approach can successfully generate plausible Wikipedia articles while being able to ensure both the readability and trustfulness of the content. Our approach also exhibits certain progress toward the hierarchical organization of content. Still, the generated articles may include artifacts and misplaced information, and the results of subsection generation are preliminary, suggesting that ad hoc open-domain Wikipedia article generation remains a significant challenge.
- Graduation Semester
- 2023-08
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2023 Yunqian Bao
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…