Fully-distributed transfer learning with large DNNs on micro-controllers
Zhang, Yichi
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/120443
Description
Title
Fully-distributed transfer learning with large DNNs on micro-controllers
Author(s)
Zhang, Yichi
Issue Date
2023-05-02
Director of Research (if dissertation) or Advisor (if thesis)
Kumar, Rakesh
Department of Study
Electrical & Computer Eng
Discipline
Electrical & Computer Engr
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
transfer learning
distributed computing
deep learning
neural networks
Internet of Things
edge computing
Abstract
Large convolutional neural networks have shown promising performance and reliability in recent decades. Deploying them in our daily life with the help of IoT networks is a very promising idea and has been practiced for many years. However, most IoT devices do not have the memory to host large models. As a solution to this problem, fully distributed edge learning has been receiving an incredible amount of attention in recent years due to its scalability and privacy-preserving properties. Most recent works utilize peer-to-peer (P2P) federated learning or ensemble learning techniques to train their models. However, most of these methods require either an extensive exchange of gradients during training or careful preparation of input data to ensure accuracy.
We propose a design that utilizes model partition and transfer learning to enable the deployment of arbitrarily large forward convolutional neural networks onto IoT networks with many nodes. They will be able to self-adjust with on-device learning and reduce the amount of manual labor for model updates. We use the state-of-the-art model partition method as the starting point for our design and add an additional transfer learning classifier to each node. We use MobileNetV1 as our base neural network and Arduino Nano 33 BLE as our choice of hardware devices. We achieved over 10.94X speed up in terms of end-to-end inference latency compared to a theoretical node with infinite memory and 1.3X speed up compared to the state-of-the-art design. For model performance, we achieved over 90% top-5 accuracy, exceeding the original MobileNetV1 model by around 6%. We also proved that distributing the task to multiple smaller transfer learning classifiers significantly reduces our training time, and individual classifiers are resistant to concept drifts.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.