Designing representational architectures in recognition

Farhadi, Ali

Designing representational architectures in recognition

Farhadi, Ali

Permalink

https://hdl.handle.net/2142/29815

Description

Title

Designing representational architectures in recognition

Author(s)

Farhadi, Ali

Issue Date

2012-02-06T20:19:05Z

Director of Research (if dissertation) or Advisor (if thesis)

Forsyth, David A.

Committee Member(s)

Malik, Jitendra
Freeman, William
Roth, Dan
Hoiem, Derek W.
Yagnik, Jay

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Computer vision
object recognition
visual attributes
visual phrases.

Abstract

Recognition is a deep and fundamental question in computer vision. If approached correctly, object recognition provides insight to several interesting problems with crucial applications. In a typical setting, recognition is defined as the problem of learning about a fixed set of categories from training examples provided for those categories. At test time, then the problem is to which of those learned categories a test image belongs. This thesis tries to question the typical settings of recognition and shows remarkable achievements as a result of shifting our point of view to fundamentals of recognition. In current settings, the final goal of recognition systems is to predict a list of category name tags for images. But there is more to recognition that a list of category names. Images exhibit a great deal of information that cannot be conveyed with a list of name tags. The main focus of this thesis is to produce richer descriptions for images. Inspired by how human describe images, our goal is to describe images with sentences. This thesis introduces a non-parametric approach for describing images with sentences that produces promising results. Exploring the idea of describing images with sentences raises deep and interesting concerns in recognition: how to deal with unfamiliar objects, how to describe objects, and how to recognize complex composites of objects. This thesis introduces visual attributes and shows how the attribute-based recognition can reason about unfamiliar objects. The attribute-based recognition also allows description of objects, the reporting of unusual properties of familiar objects, and learning about novel categories with few or even no visual training examples (from pure textual descriptions of categories). Analogous to phrases in machine translation, this thesis also introduces visual phrases; elements of recognition that correspond to a chunk of meaning bigger than objects and smaller than scenes. Visual phrases exhibit such a characteristic appearance that makes detecting them as one entity much simpler and significantly more accurate than detecting the participating objects. This thesis shows that including visual phrases in the vocabulary of recognition results in significant improvements in recognition. %Current common practices in recognition are formed around problem settings that have been copied from the initial settings of recognition problems, and ignore tremendous progress in terms of machinery and methods. With the astonishing developments in recognition, I believe, we should rethink recognition. Recognition should be redefined to the capacity of current methods with the applications of recognition tasks in mind. In this thesis I try to question the usual settings of recognition from several different angles and show remarkable achievements as a result of shifting our point of view to the recognition problem. %There are two main categories of issues that this thesis is concerned with: knowledge transfer, and knowledge formation. Knowledge transfer is the capability of transferring knowledge gained in learning one task to relevant but new tasks. For example, learning about how the appearance of some objects changes across viewpoint may help the recognition system to reason about the change in the appearance of other objects. Knowledge formation is the ability to reshape the knowledge representation to a form most suitable for a specific recognition task. For example, how to describe an image in the most useful format related to a desired task. The work presented in this thesis tries to provide insight to deep and yet basic questions in recognition: What should we recognize? At what level should we recognize entities? What does learning about some objects reveal about other objects? What should we say when an unfamiliar object is presented? How can we learn to predict deviations from typicalities in categories? What should be the output of a recognition system? And what is the quantum of recognition? %The central theme of the methods presented in this thesis is learning representational architectures around recognition problems. My approaches to all these problems are centered around one fundamental observation: finding the right representation is a key component in recognition.

Graduation Semester

2011-12

Permalink

http://hdl.handle.net/2142/29815

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Computer Science

Dissertations and Theses from the Siebel School of Computer Science

Designing representational architectures in recognition

Farhadi, Ali

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

Log In