An Exploration of Multimodal Document Classification Strategies

Chen, Scott D.

An Exploration of Multimodal Document Classification Strategies

Chen, Scott D.

Content Files

Chen_Scott.pdf

Permalink

https://hdl.handle.net/2142/24006

Description

Title

An Exploration of Multimodal Document Classification Strategies

Author(s)

Chen, Scott D.

Issue Date

2011-05-25T14:51:49Z

Director of Research (if dissertation) or Advisor (if thesis)

Moulin, Pierre

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

M.S.

Degree Level

Thesis

Date of Ingest

2011-05-25T14:51:49Z

Keyword(s)

meta-classifier
classification
multimodal
document
support vector machines

Abstract

This thesis explores multimodal document classification algorithms in a unified framework. Classification algorithms are designed to exploit both text and image information, which proliferates in modern documents. We design meta-classification schemes that combine and integrate state-of-the-art text and image feature-extractors with state-of-the-art classifiers. Meta-classifiers fuse information across modalities that differ in nature and hence have more information on hand to make decisions. This thesis also discusses strategies that exploit correlations not only within a single modality but also among modalities. Techniques that exploit correlations within a modality include image meta-feature vector combination and latent Dirichlet allocation-based image meta-feature extraction. Another technique that exploits correlations between text and image cleans image with text information. Experiments on real-world databases from Wikipedia demonstrate the benefits of metaclassification for multimodal documents.

Graduation Semester

2011-05

Permalink

http://hdl.handle.net/2142/24006

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Electrical and Computer Engineering

Dissertations and Theses in Electrical and Computer Engineering

An Exploration of Multimodal Document Classification Strategies

Chen, Scott D.

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Electrical and Computer Engineering

Log In