Learning and adapting visual models for multiple specialized tasks

Mallya, Arun Mohanray

Learning and adapting visual models for multiple specialized tasks

Mallya, Arun Mohanray

Permalink

https://hdl.handle.net/2142/101314

Description

Title

Learning and adapting visual models for multiple specialized tasks

Author(s)

Mallya, Arun Mohanray

Issue Date

2018-04-15

Director of Research (if dissertation) or Advisor (if thesis)

Lazebnik, Svetlana

Doctoral Committee Chair(s)

Lazebnik, Svetlana

Committee Member(s)

Forsyth, David
Hoiem, Derek
Shakhnarovich, Gregory

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Action Recognition, Visual Relationship Detection, Image Situations, Multi Task Training

Abstract

A key requirement for any agent that wishes to interact with the visual world is the ability to understand the behavior of objects in the scene, primarily through visual means. We humans, through our cognitive system, are able to localize other people and objects in scenes, understand their relationship to the surrounding environment, and reason about not only their actions and attributes, but also about concepts which require knowledge beyond what is afforded by the pixels in visual input, such as possible future states, motion, a person’s motivations, and so on. In this thesis, we outline work that takes small steps towards solving this daunting task of replicating the human visual cognitive system. This dissertation presents methods for predicting actions, interactions with objects, and increasingly structured scenarios from single images. We devise simple methods that make use of a variety of cues by taking into account the structure inherent in the tasks we aim to solve. We show that by solving these tasks as an intermediate step and using their outputs as features, we can develop methods that operate on visual and language inputs to improve performance on tasks that require high-level image information, such as answering questions about images and producing captions for images. One issue that accompanies the learning of multiple tasks with separate deep networks, such as the work described above, is the need to store separate models, which increases storage requirements and affects scalability. We formulate and present two novel methods that draw inspiration from network pruning and weight quantization that can reuse parts of an existing network for learning new tasks with minimal additional overhead, without hurting performance on tasks that were learned earlier.

Graduation Semester

2018-05

Type of Resource

text

Permalink

http://hdl.handle.net/2142/101314

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Computer Science

Dissertations and Theses from the Dept. of Computer Science

Learning and adapting visual models for multiple specialized tasks

Mallya, Arun Mohanray

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

Log In