Withdraw
Loading…
A deeper look into multi-task learning ability of unified text-to-text transformer
Cheng, Xiang
Loading…
Permalink
https://hdl.handle.net/2142/110569
Description
- Title
- A deeper look into multi-task learning ability of unified text-to-text transformer
- Author(s)
- Cheng, Xiang
- Issue Date
- 2021-04-27
- Director of Research (if dissertation) or Advisor (if thesis)
- Zhai, Chengxiang
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- natural language processing
- structure prediction
- multi-task learning
- Abstract
- Structure prediction (SP) tasks are important in natural language understanding in the sense that they provide complex and structured knowledge of the text. Recently, some unified text-to-text transformer models like T5 and TANL have produced competitive results on SP tasks. These models convert SP tasks into a seq2seq problem, where a transformer is used to generate sequences with special tokens representing the extracted spans, labels, and relationships. Compared to many popular Natural Language Understanding models that are designed specifically for the task, the output of the text-to-text transformer is more flexible. With proper format, it could be trained on multiple tasks together and take advantage of the shared knowledge between tasks. To better understand how these models achieve better performance by multi-task learning, we designed several experiments to measure the knowledge transfer ability of a recently proposed model, TANL. In our experiments, we found that the multi-head attention in the decoder can capture the relationship between tasks which leads to performance improvement. Another finding is that TANL may produce many outputs with invalid format when trained from scratch, and starting from a T5 pre-trained model helps to mitigate this problem. Based on these observations and some new intuitions, we proposed an improved version of TANL called SDCT5 (step decomposed and constrained text-to-text Transformer). Preliminary experiment results show that our model can achieve better performance on SP tasks compared to TANL and benefit more from multi-task learning.
- Graduation Semester
- 2021-05
- Type of Resource
- Thesis
- Permalink
- http://hdl.handle.net/2142/110569
- Copyright and License Information
- Copyright 2021 Xiang Cheng
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…