Withdraw
Loading…
Granular text classification for biomedical natural language processing
Abdar, Omid
This item's files can only be accessed by the System Administrators group.
Permalink
https://hdl.handle.net/2142/122213
Description
- Title
- Granular text classification for biomedical natural language processing
- Author(s)
- Abdar, Omid
- Issue Date
- 2023-11-15
- Director of Research (if dissertation) or Advisor (if thesis)
- Schwartz, Lane
- Stevens, Jon
- Doctoral Committee Chair(s)
- Schwartz, Lane
- Committee Member(s)
- Markee, Numa
- Sadler, Randall
- Ionin, Tania
- Department of Study
- Linguistics
- Discipline
- Linguistics
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- natural language processing
- NLP
- biomedical natural language processing
- text classification
- transformers
- granularity
- Abstract
- Text classification is a classic NLP problem with numerous applications and use cases in sentiment analysis, spam detection, document organization, and information retrieval systems. While text classification techniques can be applied to documents of different lengths, supervised machine learning approaches for text classification require labeled documents similar in length to those that will be used at inference. This dissertation examines how labeled documents at higher levels of linguistic granularity (i.e., longer documents) may be synthesized to develop text classifiers at a lower level of linguistic granularity (i.e., shorter text). More specifically, we focus on Biomedical Natural Language Processing as our target domain and address (1) how well document-level classifiers perform for sentence-level classification; (2) how document-level labeled data may be synthesized to perform sentence-level text classification; and (3) how the performance of synthesized approaches compares against benchmark performances using labeled data. We present feature contribution analysis experiments in Naïve Bayes classifiers as well as self-attention experiments in BERT, and report sentence classification results that beat baseline performance for both model types. Finally, we present an extensive qualitative error analysis to identify major error trends in our results and discuss the significance of each error category with respect to our primary research questions.
- Graduation Semester
- 2023-12
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2023 Omid Abdar
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…