Withdraw
Loading…
Visual question answering using external knowledge
Gulganjalli Narasimhan, Medhini
Loading…
Permalink
https://hdl.handle.net/2142/104918
Description
- Title
- Visual question answering using external knowledge
- Author(s)
- Gulganjalli Narasimhan, Medhini
- Issue Date
- 2019-04-25
- Director of Research (if dissertation) or Advisor (if thesis)
- Schwing, Alexander G.
- Lazebnik, Svetlana
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Visual question answering, knowledge bases, graph convolution networks
- Abstract
- Accurately answering a question about a given image requires combining observations with general knowledge. While this is effortless for humans, reasoning with general knowledge remains an algorithmic challenge. To advance research in this direction, a novel `fact-based' visual question answering (FVQA) task has been introduced recently along with a large set of curated facts which link two entities, i.e., two possible answers, via a relation. Given a question-image pair, keyword matching techniques have been employed to successively reduce the large set of facts and were shown to yield compelling results despite being vulnerable to misconceptions due to synonyms and homographs. To overcome these shortcomings, we introduce two new approaches in this work. We develop a learning-based approach which goes straight to the facts via a learned embedding space. We demonstrate state-of-the-art results on the challenging recently introduced factbased visual question answering dataset, outperforming competing methods by more than 5%. Upon further analysis, we observe that a successive process which considers one fact at a time to form a local decision is sub-optimal. To counter this, in our second approach we develop an entity graph and use a graph convolutional network to `reason' about the correct answer by jointly considering all entities. We show on the FVQA dataset that this leads to an improvement in accuracy of around 7% compared to the state-of-the-art.
- Graduation Semester
- 2019-05
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/104918
- Copyright and License Information
- Copyright 2019 Medhini Gulganjalli Narasimhan
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…