Fact-based visual question answering using knowledge graph embeddings

Ramnath, Kiran

Fact-based visual question answering using knowledge graph embeddings

Ramnath, Kiran

Permalink

https://hdl.handle.net/2142/110572

Description

Title

Fact-based visual question answering using knowledge graph embeddings

Author(s)

Ramnath, Kiran

Issue Date

2021-04-27

Director of Research (if dissertation) or Advisor (if thesis)

Hasegawa-Johnson, Mark

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

M.S.

Degree Level

Thesis

Keyword(s)

Visual Question Answering
Knowledge Graphs

Abstract

Humans have a remarkable capability to learn new concepts, process them in relation to their existing mental models of the world, and seamlessly leverage their knowledge and experiences while reasoning about the outside world perceived through vision and language. Fact-based Visual Question Answering (FVQA), a challenging variant of VQA, requires a QA-system to mimic this human ability. It must include facts from a diverse knowledge graph (KG) in its reasoning process to produce an answer. Large KGs, especially common-sense KGs, are known to be incomplete, i.e., not all non-existent facts are always incorrect. Therefore, being able to reason over incomplete KGs for QA is a critical requirement, in real-world applications, that has not been addressed extensively in the literature. We develop a novel QA architecture that allows us to reason over incomplete KGs, something current FVQA state-of-the-art (SOTA) approaches lack due to their critical reliance on fact retrieval. We use KG embeddings, a technique widely used for KG completion, for the downstream task of FVQA. We also present a new image representation technique we call image-as-knowledge which posits that an image is a collection of knowledge concepts describing each entity present in it. We also show that KG embeddings hold complementary information to word embeddings. A combination of both metrics permits performance comparable to SOTA methods in the standard answer retrieval task, and significantly better (26% absolute) in the proposed missing-edge reasoning task. The next research problem pursued is extending the accessibility of such systems to users through a speech interface and providing support to multiple languages, which have not been addressed in prior studies. We present a new task and a synthetically generated dataset to do Fact-based Visual Spoken-Question Answering (FVSQA). FVSQA is based on the FVQA dataset, with the difference being that the question is spoken rather than typed. Three sub-tasks are proposed: (1) speech-to-text based, (2) end-to-end, without speech-to-text as an intermediate component, and (3) cross-lingual, in which the question is spoken in a language different from that in which the KG is recorded. The end-to-end and cross-lingual tasks are the first to require world knowledge from a multi-relational KG as a differentiable layer in an end-to-end spoken language understanding task, hence the proposed reference implementation is called Worldly-Wise (WoW). WoW is shown to perform end-to-end cross-lingual FVSQA at the same levels of accuracy across three languages - English, Hindi, and Turkish.

Graduation Semester

2021-05

Type of Resource

Thesis

Permalink

http://hdl.handle.net/2142/110572

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Electrical and Computer Engineering

Dissertations and Theses in Electrical and Computer Engineering

Fact-based visual question answering using knowledge graph embeddings

Ramnath, Kiran

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Electrical and Computer Engineering

Log In