Invisible Data Communities: Detecting Scientific Com-munities Based on Dataset Affinity Networks
Author(s)
Bratt, Sarah
Buchanan, Sarah
Honick, Brendan
Gala, Bhakti
Issue Date
2024-03-20
Keyword(s)
Research Datasets
Community Detection
GenBank
Scientific Collaboration
Abstract
Scientific communities are usually defined based on collaboration ties or features of publications such as topic, keywords, or textual characteristics. An emergent type of latent community that constitutes the backbone of scholarly communication in the data-intensive era is affinity communities based on open research datasets. In this paper, we analyze a sample of 1.2 million GenBank datasets’ bibliographic metadata and track community structure and evolution from 1992-2021. We use an affinity network approach to identify a tripartite network of links between (1) scientists co-authoring datasets, (2) taxonomic classifications, and (3) journals. By identifying clustering tendencies over time, affinity networks of dataset authors provide a novel source to inform for recommendation systems and collections development. We argue it is critical to develop approaches to community detection that include dataset attributes to expand the biblio-graphic universe by including dataset contributors, concepts, and scholarly entities.
Publisher
iSchools
Series/Report Name or Number
iConference 2024 Proceedings
Type of Resource
Other
Language
eng
Handle URL
https://hdl.handle.net/2142/122823
Copyright and License Information
Copyright 2024 is held by Sarah Bratt, Sarah Buchanan, Brendan Honick, and Bhakti Gala. Copyright permissions, when appropriate, must be obtained directly from the authors.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.