Withdraw
Loading…
Cross-layer methods for energy-efficient inference using in-memory architectures
Gonugondla, Sujan Kumar
Loading…
Permalink
https://hdl.handle.net/2142/107929
Description
- Title
- Cross-layer methods for energy-efficient inference using in-memory architectures
- Author(s)
- Gonugondla, Sujan Kumar
- Issue Date
- 2020-04-28
- Director of Research (if dissertation) or Advisor (if thesis)
- Shanbhag, Naresh R
- Doctoral Committee Chair(s)
- Shanbhag, Naresh R
- Committee Member(s)
- Hanumolu, Pavan Kumar
- Schwing, Alexander
- Gopalakrishnan, Kailash
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- deep neural networks, edge, Inference, machine learning, on-chip learning, in-memory architectures, in-memory computing, application specific integrated circuits, SRAM, quantization, compression, accelerator, energy-efficiency, cross-layer
- Abstract
- In the near future, we will be surrounded by intelligent devices that transform the way we interact with the world. These devices need to acquire and process data to derive actions and interpretations in order to automate/monitor many tasks without human intervention. Such tasks require the implementation of complex machine learning algorithms on these devices. Deep neural networks (DNNs) have evolved into the state-of-the-art approach for machine learning tasks. However, realizing computationally intensive machine learning (ML) algorithms such as DNNs under stringent constraints on energy, latency, and form-factor is a formidable challenge. In conventional von Neumann architectures, the energy and latency cost of realizing ML algorithms is dominated by memory accesses. To address this issue, the deep in-memory architecture (DIMA) was proposed, which embeds mixed-signal computation as an integral part of the memory read cycle. Deep in-memory architectures have shown up to 100x gains in energy-delay product (EDP) over conventional digital von Neumann architectures. However, the use of mixed-signal computation makes in-memory architectures susceptible to variations and other circuit non-idealities. Therefore in-memory architectures, when implementing ML tasks, exhibit a fundamental trade-off between system-level energy, latency, and accuracy. Our research focuses on developing cross-layer methods to optimize the system-level energy-latency-accuracy of in-memory architectures for ML applications. First, an automated quantization framework is presented to minimize the precision requirements of DNNs. This framework allocates precision at kernel-level granularity via an iterative greedy process and demonstrates up to 1.2x-1.3x lower precision requirements compared to the state-of-the-art methods on compact networks such as MobileNet-V1. Next, a compositional framework is proposed that can be used to relate the energy consumption and SNR of in-memory architectures to the various circuit, architectural, and algorithmic parameters. Analysis using this framework will allow us to design in-memory architecture to meet the application-level precision requirements. The energy efficiency of DIMA can also be enhanced by the use of compensation techniques to enable a low-SNR operation without any loss in system-level accuracy. The use of stochastic gradient descent (SGD) based on-chip learning to compensate for the impact of chip-specific process variations is studied. The benefits of on-chip learning are demonstrated on a 65 nm prototype integrated circuit (IC) that shows a 2.4x reduction in energy over DIMA operating with off-chip trained weights. When compared to conventional digital architectures, this IC demonstrates up to 100x improvement in energy-delay product (EDP). In-memory architectures using beyond-CMOS technologies such as STT-MRAM and ReRAM crossbars have become popular due to their advantages in terms of density and scalability. However, such resistive crossbars suffer from inaccurate writes due to device variability and cycle-to-cycle (CTC) variations. We present the Single-Write In-memory Program-vErify (SWIPE) method to achieve high-accuracy writes for crossbar-based in-memory architectures at 5x to 10x lower cost than standard program-verify methods. SWIPE leverages the bit-sliced attribute of crossbar-based in-memory architectures and the statistics of conductance variations to compensate for device non-idealities. Extending in-memory computing to storage-class technologies such as NAND flash can be challenging due to stringent density constraints, large capacitances, and low mobility transistors. DIMA for NAND flash memories is introduced, where 8x-to-23x reduction in energy and 9x-to-15x improvement in throughput over the conventional NAND flash systems are achieved. We demonstrate that cross-layer methods are effective in enhancing the system energy, latency, and accuracy of ML systems realized via in-memory architectures.
- Graduation Semester
- 2020-05
- Type of Resource
- Thesis
- Permalink
- http://hdl.handle.net/2142/107929
- Copyright and License Information
- Copyright 2020 Sujan Gonugondla
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Electrical and Computer Engineering
Dissertations and Theses in Electrical and Computer EngineeringManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…