Withdraw
Loading…
Geometries of word embeddings
Mu, Jiaqi
Loading…
Permalink
https://hdl.handle.net/2142/106421
Description
- Title
- Geometries of word embeddings
- Author(s)
- Mu, Jiaqi
- Issue Date
- 2019-09-04
- Director of Research (if dissertation) or Advisor (if thesis)
- Viswanath, Pramod
- Doctoral Committee Chair(s)
- Viswanath, Pramod
- Committee Member(s)
- Srikant, Rayadurgam
- Bhat, Suma
- Oh, Sewoong
- Sun, Ruoyu
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- word embedding
- natural language processing
- representation learning
- Abstract
- Real-valued word embeddings have transformed natural language processing (NLP) applications, recognized for their ability to capture linguistic regularities. Popular examples are word2vec, GloVe, GPT and BERT. Both word2vec and GloVe are static whose word representations are independent of its context, while GPT and BERT are contextualized whose representations will change corresponding to the semantics conveyed by its surrounding words. In this dissertation, we study four problems associated with the geometrics of word embeddings. First, we demonstrate a very simple, and yet counter-intuitive, postprocessing technique -- eliminate the common mean vector and a few top dominating directions from the word vectors -- that achieves better performances on a variety of standard benchmarks than the original ones. Sentences, as a sequence of words, are also important semantic units of natural language. We extend the embeddings of words toward representing sentences by the low-rank subspace spanned by its word vectors. Such an unsupervised representation is empirically validated via semantic textual similarity tasks on 19 different datasets, where it outperforms the sophisticated neural network models by 15% on average. Having a good sentence embedding, in turn, helps improve word representations. This is because a single vector does not suffice to model the polysemous nature of many (frequent) words, i.e., words with multiple meanings. We leverage the sentence representations on em unsupervised polysemy modeling, which we call K-Grassmeans. This approach is quantitatively tested on standard sense induction and disambiguation datasets and present new state-of-the-art results. Finally, we study the contextualized word embeddings. Given the rapid growth of computational power, pretrained language models are proposed to capture common-sense knowledge hidden behind large training corpora and have achieved great success in natural language understanding (NLU) tasks. We study these pretrained language models using influence function, which characterizes the influence of different training samples on the prediction result of each test sample. The empirical discoveries suggest an interesting future research direction: designing novel regularizations to penalize these correlations during fine-tuning.
- Graduation Semester
- 2019-12
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/106421
- Copyright and License Information
- Copyright 2019 Jiaqi Mu
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Electrical and Computer Engineering
Dissertations and Theses in Electrical and Computer EngineeringManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…