Implementing neural machine translator with bi-directional GRU and attention mechanism on FPGAS using HLS

Li, Qin

Implementing neural machine translator with bi-directional GRU and attention mechanism on FPGAS using HLS

Li, Qin

This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.

Permalink

https://hdl.handle.net/2142/99998

Description

Title

Implementing neural machine translator with bi-directional GRU and attention mechanism on FPGAS using HLS

Author(s)

Li, Qin

Contributor(s)

Chen, Deming

Issue Date

2018-05

Keyword(s)

Machine Learning
FPGAs
Internet of Things

Abstract

Neural machine translation (NMT) is a popular topic in the natural language processing field. To further improve the accuracy of encoder-decoder based algorithms, an NMT model used bidirectional RNNs (Recurrent Neural Network), attention mechanism and beam search algorithm to improve the accuracy of the language translation. The implementation is realized in GPU because of its good performance on floating-point calculation and parallel programming. However, for obtaining better performance as well as lower power consumption, hardware implementation should be considered. In the NMT model, the large scope of its structure makes it hard for hardware implementation using purely RTL code. Therefore, we use Vivado High Level Synthesis tools to map the existing NMT model to a field-programmable gate array (FPGA) by using synthesizable C code and pragmas. The large Virtex UltraScale+ FPGA board is applied in this work to accommodate the large network flow. In our work, we build modules for the layers and algorithm applied in the NMT model, accelerate the computation inside each layer and utilize the hardware parallelism opportunity existing in the model. Both accelerating the computation and instantiating the same modules to satisfy parallelism lead to more resource usage, which is a good thing if resources are allocated efficiently. To make sure the resource in FPGA is enough for the entire flow, we develop a special structure for matrix multiplication to achieve both higher speed and less resource usage. In this case, the parallelized modules can be instantiated in limited utilities

Type of Resource

text

Language

Permalink

http://hdl.handle.net/2142/99998

Owning Collections

Senior Theses - Electrical and Computer Engineering PRIMARY

The best of ECE undergraduate research

Implementing neural machine translator with bi-directional GRU and attention mechanism on FPGAS using HLS

Li, Qin

Permalink

Description

Owning Collections

Senior Theses - Electrical and Computer Engineering PRIMARY

Log In