Withdraw
Loading…
Event-centric multimodal knowledge acquisition
Li, Manling
Loading…
Permalink
https://hdl.handle.net/2142/121435
Description
- Title
- Event-centric multimodal knowledge acquisition
- Author(s)
- Li, Manling
- Issue Date
- 2023-07-10
- Director of Research (if dissertation) or Advisor (if thesis)
- Ji, Heng
- Doctoral Committee Chair(s)
- Ji, Heng
- Committee Member(s)
- Han, Jiawei
- Zhai, Chengxiang
- Chang, Shih-Fu
- Cho, Kyunghyun
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- multimodal
- knowledge
- event-centric
- Abstract
- What happened? Who? When? Where? Why? What will happen next? are the fundamental questions asked to comprehend the overwhelming amount of information. Answers to these questions are the core knowledge communicated through multiple forms of information, regardless of whether presented as text, images, videos, audio, or other modalities. To obtain such knowledge from multimodal data, this dissertation focuses on Multimodal Information Extraction (IE), and propose Event-Centric Multimodal Knowledge Acquisition to evolve traditional Entity-centric Single-modality knowledge into Event-centric Multi-modality knowledge. Traditional entity-centric approaches to consuming multimodal information focus on concrete concepts (such as objects, object types, physical relations, e.g., a person in a car), while this dissertation endows machines to understand complex abstract semantic structures that are difficult to ground into image regions but are essential knowledge (such as events and semantic roles of objects, e.g., driver, passenger, passerby, salesperson). It is able to consolidate complex semantic structures of multiple modalities, providing a major benefit over recent research advances in single-modality (text-only or vision-only) knowledge. Such a transformation poses significant challenges in terms of understanding multimodal semantic structures (such as semantic roles) and temporal dynamics (such as future participants and their roles): - Understanding Multimodal Semantic Structures to answer What happened?, Who?, Where?, and When? (Knowledge Extraction): Due to the structural nature and lack of anchoring in a specific image region, abstract semantic structures are difficult to synthesize between text and vision modalities through general large-scale pretraining. We introduce complex event semantic structures into vision-language pretraining (CLIP-Event), and propose a zero-shot cross-modal transfer of semantic understanding abilities from language to vision, which resolves the poor portability issue of IE and supports Zero-shot Multimodal Event Extraction (M2E2) for the first time. We also release an open-source Multimodal IE system GAIA to serve as an off-the-shelf tool for the research community. - Understanding Temporal Dynamics to answer What will happen next?, Who will participant? and Why? (Knowledge Reasoning): The significance of capturing temporal dynamics has led to recent advances in script knowledge learning, however, which has been overly simplified to be local and sequential. We propose Event Graph Schema, which open doors to a global event graph context to enable alternative predictions, along with structural justifications including location-, attribute-, and participant-specific details. - Generating truthfully with Event-Centric Knowledge Facts (Knowledge Driven Applications): Our work has shown positive results on long-standing open problems, such as Timeline Summarization, Meeting Summarization, and Multimedia News Question Answering, Report Generation, etc. This work on Multimedia Event Knowledge Graphs aims to open doors to the next generation of information access, in order to equip machines with factual knowledge discovery and reasoning from diverse sources of information, so that we can lay a foundation for promoting factuality and truthfulness in information access, through a structured knowledge view that is easily explainable, highly compositional, and capable of long-horizon reasoning.
- Graduation Semester
- 2023-08
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2023 Manling Li
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…