Implementing neural machine translator with bi-directional GRU and attention mechanism on FPGAS using HLS
Li, Qin
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/99998
Description
Title
Implementing neural machine translator with bi-directional GRU and attention mechanism on FPGAS using HLS
Author(s)
Li, Qin
Contributor(s)
Chen, Deming
Issue Date
2018-05
Keyword(s)
Machine Learning
FPGAs
Internet of Things
Abstract
Neural machine translation (NMT) is a popular topic in the natural language processing field. To further
improve the accuracy of encoder-decoder based algorithms, an NMT model used bidirectional RNNs
(Recurrent Neural Network), attention mechanism and beam search algorithm to improve the accuracy
of the language translation. The implementation is realized in GPU because of its good performance on
floating-point calculation and parallel programming. However, for obtaining better performance as well
as lower power consumption, hardware implementation should be considered. In the NMT model, the
large scope of its structure makes it hard for hardware implementation using purely RTL code.
Therefore, we use Vivado High Level Synthesis tools to map the existing NMT model to a field-programmable
gate array (FPGA) by using synthesizable C code and pragmas. The large Virtex
UltraScale+ FPGA board is applied in this work to accommodate the large network flow. In our work, we
build modules for the layers and algorithm applied in the NMT model, accelerate the computation inside
each layer and utilize the hardware parallelism opportunity existing in the model. Both accelerating the
computation and instantiating the same modules to satisfy parallelism lead to more resource usage,
which is a good thing if resources are allocated efficiently. To make sure the resource in FPGA is enough
for the entire flow, we develop a special structure for matrix multiplication to achieve both higher speed
and less resource usage. In this case, the parallelized modules can be instantiated in limited utilities
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.