Grounding natural language phrases in images and video
Plummer, Bryan A.
Loading…
Permalink
https://hdl.handle.net/2142/100977
Description
Title
Grounding natural language phrases in images and video
Author(s)
Plummer, Bryan A.
Issue Date
2018-04-16
Director of Research (if dissertation) or Advisor (if thesis)
Lazebnik, Svetlana
Doctoral Committee Chair(s)
Lazebnik, Svetlana
Committee Member(s)
Hockenmaier, Julia
Hoiem, Derek
Brown, Matthew
Department of Study
Computer Science
Discipline
Computer Science
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
Computer Vision, Natural Language Processing, Phrase Grounding
Abstract
Grounding language in images has shown it can help improve performance on many image-language tasks. To spur research on this topic, this dissertation introduces a new dataset which provides the ground truth annotations of the location of noun phrase chunks in image captions. I begin by introducing a constituent task termed phrase localization, where the goal is to localize an entity known to exist in an image when provided with a natural language query. To address this task, I introduce a model which learns a set of models, each of which capture a different concept which is useful in our task. These concepts can be predefined, such as attributes gleamed from the adjectives, as well as those which are automatically learned in a single-end-to-end neural network. I also address the more challenging detection style task, where the goal is to localize a phrase and determine if it is associated with an image. Multiple applications of the models presented in this work demonstrate their value beyond the phrase localization task.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.