Withdraw
Loading…
Fact-based visual question answering using knowledge graph embeddings
Ramnath, Kiran
Loading…
Permalink
https://hdl.handle.net/2142/110572
Description
- Title
- Fact-based visual question answering using knowledge graph embeddings
- Author(s)
- Ramnath, Kiran
- Issue Date
- 2021-04-27
- Director of Research (if dissertation) or Advisor (if thesis)
- Hasegawa-Johnson, Mark
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Visual Question Answering
- Knowledge Graphs
- Abstract
- Humans have a remarkable capability to learn new concepts, process them in relation to their existing mental models of the world, and seamlessly leverage their knowledge and experiences while reasoning about the outside world perceived through vision and language. Fact-based Visual Question Answering (FVQA), a challenging variant of VQA, requires a QA-system to mimic this human ability. It must include facts from a diverse knowledge graph (KG) in its reasoning process to produce an answer. Large KGs, especially common-sense KGs, are known to be incomplete, i.e., not all non-existent facts are always incorrect. Therefore, being able to reason over incomplete KGs for QA is a critical requirement, in real-world applications, that has not been addressed extensively in the literature. We develop a novel QA architecture that allows us to reason over incomplete KGs, something current FVQA state-of-the-art (SOTA) approaches lack due to their critical reliance on fact retrieval. We use KG embeddings, a technique widely used for KG completion, for the downstream task of FVQA. We also present a new image representation technique we call image-as-knowledge which posits that an image is a collection of knowledge concepts describing each entity present in it. We also show that KG embeddings hold complementary information to word embeddings. A combination of both metrics permits performance comparable to SOTA methods in the standard answer retrieval task, and significantly better (26% absolute) in the proposed missing-edge reasoning task. The next research problem pursued is extending the accessibility of such systems to users through a speech interface and providing support to multiple languages, which have not been addressed in prior studies. We present a new task and a synthetically generated dataset to do Fact-based Visual Spoken-Question Answering (FVSQA). FVSQA is based on the FVQA dataset, with the difference being that the question is spoken rather than typed. Three sub-tasks are proposed: (1) speech-to-text based, (2) end-to-end, without speech-to-text as an intermediate component, and (3) cross-lingual, in which the question is spoken in a language different from that in which the KG is recorded. The end-to-end and cross-lingual tasks are the first to require world knowledge from a multi-relational KG as a differentiable layer in an end-to-end spoken language understanding task, hence the proposed reference implementation is called Worldly-Wise (WoW). WoW is shown to perform end-to-end cross-lingual FVSQA at the same levels of accuracy across three languages - English, Hindi, and Turkish.
- Graduation Semester
- 2021-05
- Type of Resource
- Thesis
- Permalink
- http://hdl.handle.net/2142/110572
- Copyright and License Information
- Copyright 2021 Kiran Ramnath
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Electrical and Computer Engineering
Dissertations and Theses in Electrical and Computer EngineeringManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…