Withdraw
Loading…
Adaptive model inference under computational resource constraints
Yu, Haichao
Loading…
Permalink
https://hdl.handle.net/2142/115689
Description
- Title
- Adaptive model inference under computational resource constraints
- Author(s)
- Yu, Haichao
- Issue Date
- 2022-04-11
- Director of Research (if dissertation) or Advisor (if thesis)
- Shi, Humphrey
- Doctoral Committee Chair(s)
- Shi, Humphrey
- Committee Member(s)
- Liang, Zhi-Pei
- Hasegawa-Johnson, Mark
- Hua, Gang
- Huang, Gao
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Adaptive Inference
- Dynamic Neural Networks
- Efficient Inference
- Neural Network Quantization
- Gradient Boosting
- Abstract
- Deep learning has achieved great success in various computer vision tasks. When deploying deep learning models on different devices or a single device at different times, the amount of available resources can be different. Therefore, it is an important research topic to design neural networks that can adapt to various scenarios with different computational resources and achieve a trade-off between accuracy and efficiency. To this end, one branch of existing methods is adaptive inference with sample-wise dynamic architecture. These methods can be grouped into three categories: dynamic routing, dynamic depth, and dynamic width. In this dissertation, we will focus on dynamic post-training quantization (PQ) in the dynamic routing category and early exiting in the dynamic depth category. Existing PQ methods have two problems: (a) require a possibly inaccessible training dataset for model calibration, and (b) are hard to be generalized to low bit-width quantization precision. For early-exiting methods, we find there exists a train-test mismatch problem in prior works that can cause a train-test domain gap. In this dissertation, we will explore how to solve these three problems to improve the generalizability and performance of the dynamic models. Given constrained computational resources, PQ uses a calibration dataset to quantize a trained floating-point neural network to an efficient quantized one. The calibration dataset is usually sampled from training data. However, training data may not be available because of its large storage requirement or confidential issues when the model is being quantized. In the first part of this dissertation, we propose a cross-domain calibration method that uses out-of-domain data for calibration when in-domain training data is unavailable so that PQ can be applied more flexibly. PQ works well in the case of relatively high quantization bit-widths. To make the model work in a wider range of bit-widths, we propose Any-Precision Deep Neural Network (APDNN), which is trained with a new method that allows the learned network to be flexible in numerical precision during inference. The same model in runtime can be flexibly set to different bit-widths to support the dynamic trade-off of speed and accuracy. At each bit-width, the model achieves comparable accuracy to dedicated models trained at the same precision. We demonstrate that this achievement is agnostic to model architectures and applicable to multiple vision tasks. Going further in the direction of dynamic inference, we propose an early-exiting dynamic neural network (EDNN) that can adjust its inference complexity depending on not only computational resources but also input difficulty. When resources are limited or inputs are easy to process, the model will exit at some shallow layer of the model. Otherwise, the model can execute more layers to get a more confident prediction. Different from existing EDNN methods, we formulate EDNN as an additive model inspired by gradient boosting and propose Boosted Dynamic Neural Network (BoostNet). Our method mitigates the train-test mismatch problem that exists in the existing EDNN methods. In summary, the proposed cross-domain calibration method enables more flexible PQ for dynamic quantization without in-domain training data, the APDNN method generalizes dynamic quantization to extremely low bit-widths, and BoostNet introduces extra flexibility by making the model architecture depend on inputs and mitigates the train-test mismatch problem. In addition, BoostNet can be combined with the APDNN model to allow even more inference flexibility.
- Graduation Semester
- 2022-05
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2022 Haichao Yu
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…