Adopting the two-branch network to video-text tasks
Chang, Hsiao-Ching
Loading…
Permalink
https://hdl.handle.net/2142/101210
Description
Title
Adopting the two-branch network to video-text tasks
Author(s)
Chang, Hsiao-Ching
Issue Date
2018-04-23
Director of Research (if dissertation) or Advisor (if thesis)
Lazebnik, Svetlana
Department of Study
Computer Science
Discipline
Computer Science
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
Computer vision
Video captioning
Abstract
Modeling visual context and its corresponding text description with a joint embedding network has been an effective way to enable cross-modal retrieval. However, while abundant work has been done for image-text tasks, not much exists with regards to the video domain. We hope to adopt a nonlinear embedding model, the two-branch network, to the video-text tasks in order to show its robustness. Two kinds of tasks are explored, bidirectional video-sentence retrieval and video description generation. For the retrieval task, we use nearest neighbor search to get the corresponding video or text with respect to the query. For video captioning, we incorporate the two-branch network in a traditional LSTM model with an additional embedding loss term in order to demonstrate its ability of preserving a semantic structure between video and text.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.