Withdraw
Loading…
Learning compact hashing codes for large-scale similarity search
Yu, Honghai
Loading…
Permalink
https://hdl.handle.net/2142/78338
Description
- Title
- Learning compact hashing codes for large-scale similarity search
- Author(s)
- Yu, Honghai
- Issue Date
- 2015-03-23
- Director of Research (if dissertation) or Advisor (if thesis)
- Moulin, Pierre
- Doctoral Committee Chair(s)
- Moulin, Pierre
- Committee Member(s)
- Do, Minh N.
- Smaragdis, Paris
- Veeravalli, Venugopal V.
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- hashing
- learning
- similarity search
- fingerprinting
- content identification
- robust hashing
- large-scale
- binary codes
- multimedia retrieval
- Abstract
- Retrieval of similar objects is a key component in many applications. As databases grow larger, learning compact representations for efficient storage and fast search becomes increasingly important. Moreover, these representations should preserve similarity, i.e., similar objects should have similar representations. Hashing algorithms, which encode objects into compact binary codes to preserve similarity, have demonstrated promising results in addressing these challenges. This dissertation studies the problem of learning compact hashing codes for large-scale similarity search. Specifically, we investigate two classes of approach: regularized Adaboost and signal-to-noise ratio (SNR) maximization. The regularized Adaboost builds on the classical boosting framework for hashing, while SNR maximization is a novel hashing framework with theoretical guarantee and great flexibility in designing hashing algorithms for various scenarios. The regularized Adaboost algorithm is to learn and extract binary hash codes (fingerprints) of time-varying content by filtering and quantizing perceptually significant features. The proposed algorithm extends the recent symmetric pairwise boosting (SPB) algorithm by taking feature sequence correlation into account. An information-theoretic analysis of the SPB algorithm is given, showing that each iteration of SPB maximizes a lower bound on the mutual information between matching fingerprint pairs. Based on the analysis, two practical regularizers are proposed to penalize those filters generating highly correlated filter responses. A learning-theoretic analysis of the regularized Adaboost algorithm is given. The proposed algorithm demonstrates significant performance gains over SPB for both audio and video content identification (ID) systems. SNR maximization hashing (SRN-MH) uses the SNR metric to select a set of uncorrelated projection directions, and one hash bit is extracted from each projection direction. We first motivate this approach under a Gaussian model for the underlying signals, in which case maximizing SNR is equivalent to minimizing the hashing error probability. This theoretical guarantee differentiates SNR-MH from other hashing algorithms where learning has to be carried out with a continuous relaxation of quantization functions. A globally optimal solution can be obtained by solving a generalized eigenvalue problem. Experiments on both synthetic and real datasets demonstrate the power of SNR-MH to learn compact codes. We extend SNR-MH to two different scenarios in large-scale similarity search. The first extension aims at applications with a larger bit budget. To learn longer hash codes, we propose a multi-bit per projection algorithm, called SNR multi-bit hashing (SNR-MBH), to learn longer hash codes when the number of high-SNR projections is limited. Extensive experiments demonstrate the superior performance of SNR-MBH. The second extension aims at a multi-feature setting, where more than one feature vector is available for each object. We propose two multi-feature hashing methods, SNR joint hashing (SNR-JH) and SNR selection hashing (SNR-SH). SNR-JH jointly considers all feature correlations and learns uncorrelated hash functions that maximize SNR, while SNR-SH separately learns hash functions on each individual feature and selects the final hash functions based on the SNR associated with each hash function. The proposed methods perform favorably compared to other state-of-the-art multi-feature hashing algorithms on several benchmark datasets.
- Graduation Semester
- 2015-5
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/78338
- Copyright and License Information
- Copyright 2015 Honghai Yu
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Electrical and Computer Engineering
Dissertations and Theses in Electrical and Computer EngineeringManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…