Withdraw
Loading…
Accelerating scientific research in the digital era: intelligent assessment and retrieval of research content
Kuzi, Saar
Loading…
Permalink
https://hdl.handle.net/2142/113882
Description
- Title
- Accelerating scientific research in the digital era: intelligent assessment and retrieval of research content
- Author(s)
- Kuzi, Saar
- Issue Date
- 2021-12-02
- Director of Research (if dissertation) or Advisor (if thesis)
- Zhai, ChengXiang
- Doctoral Committee Chair(s)
- Zhai, ChengXiang
- Committee Member(s)
- Chang, Kevin Chenchuan
- Ji, Heng
- Bendersky, Michael
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Scientific Literature Systems
- Research Article Figure Retrieval
- Research Article Figure Embedding
- Interactive Literature Search
- Automated Assessment of Research Articles
- Abstract
- "The efficient, effective, and timely access to the scientific literature by researchers is crucial for accelerating scientific research and discovery. Nowadays, research articles are almost exclusively published in a digital form and stored in digital libraries, accessible over the Web. Using digital libraries for storing scientific literature is advantageous as it enables access to articles at any time and place. Furthermore, digital libraries can leverage information management systems and artificial intelligence techniques to manage, retrieve, and analyze research content. Due to the large size of those libraries and their fast growth pace, the development of intelligent systems that can effectively retrieve and analyze research content is crucial for improving the productivity of researchers. In this thesis, we focus on improving literature search engines by addressing some of their limitations. One of the limitations of the current literature search engines is that they mainly treat articles as the retrieval units and do not support the direct search for any of the article's elements such as figures, tables, and formulas. In this thesis, we study how to enable researchers to access research collections using figures of articles. Figures are entities in research articles that play an essential role in scientific communications. For this reason, research figures can be utilized directly by literature systems to facilitate and accelerate research. As the first step in this direction, we propose and study the novel task of figure retrieval from collections of research articles where the goal is to retrieve research article figures using keyword queries. We focus on the textual bag-of-words representation of search queries and figures and study the effectiveness of different retrieval models for the task and various ways to represent figures using text data. The empirical study shows the benefit of using multiple textual inputs for representing a figure and combining different retrieval models. The results also shed light on the different challenges in addressing this novel task. Next, we address the limitations of the text-based bag-of-words representation of research figures by proposing and studying a new view of representation, namely deep neural network-based distributed representations. Specifically, we focus on using image data and text for learning figure representations with different model architectures and loss functions to understand how sensitive the embeddings are to the learning approach and the features used. We also develop a novel weak supervision technique for training neural networks for this task that leverages the citation network of articles to generate large quantities of training examples. The experimental results show that figure representations, learned using our weak supervision approach, are effective and outperform representations of the bag-of-words technique and pre-trained neural networks. The current systems also have minimal support for addressing queries for which a search engine performs poorly due to ineffective formulation by the user. When conducting research, poor-performing search queries may occur when a researcher faces a new or fast-evolving research topic, resulting in a significant vocabulary gap between the user's query and the relevant articles. In this thesis, we address this problem by developing a novel strategy for collaborative query construction. According to this strategy, the search engine would actively engage users in an iterative process to continuously revise a query. We propose a specific implementation of this strategy in which the search engine and the user work together to expand a search query. Specifically, the system generates expansion terms, utilizing the history of interactions of the user with it, that the user can add to the search query in every iteration to reach an ""ideal query"". The experimental results attest to the effectiveness of using this approach in improving poor-performing search queries with minimal effort from the user. The last limitation that we address in this thesis is that the current systems usually do not leverage any content analysis for the quality assessment of articles and instead rely on citation counts. In this thesis, we study the task of automatic quality assessment of research articles where the goal is to assess the quality of an article in different aspects such as clarity, originality, and soundness. Automating the quality assessment of articles could improve the current literature systems that can leverage the generated quality scores to support the search and analysis of research articles. Previous works have applied supervised machine learning to automate the assessment by learning from examples of reviewed articles by humans. In this thesis, we study the effectiveness of using topics for the task and propose a novel strategy for constructing multi-view topical features. Experimental results show that such features are effective for this task compared to deep neural network-based features and bag-of-words features. Finally, to facilitate further evaluation of the different approaches suggested in this thesis using real users and realistic user tasks, we developed AcademicExplorer, a novel general system that supports the retrieval and exploration of research articles using several new functions enabled by the proposed algorithms in this thesis, such as exploring research collections using figure embeddings, sorting research articles based on automatically generated review scores, and interactive query formulation. As an open-source system, AcademicExplorer can help advance the research, evaluation, and development of applications in this area."
- Graduation Semester
- 2021-12
- Type of Resource
- Thesis
- Permalink
- http://hdl.handle.net/2142/113882
- Copyright and License Information
- Copyright 2021 Saar Kuzi
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…