Ethnea -- an instance-based ethnicity classifier based on geo-coded author names in a large-scale bibliographic database
Torvik, Vetle I.; Agarwal, Sneha
Loading…
Permalink
https://hdl.handle.net/2142/88927
Description
Title
Ethnea -- an instance-based ethnicity classifier based on geo-coded author names in a large-scale bibliographic database
Author(s)
Torvik, Vetle I.
Agarwal, Sneha
Issue Date
2016-03
Keyword(s)
bibliometrics
ethnicity classification
machine learning
Abstract
We present a nearest neighbor approach to ethnicity classification. Given an author name, all of its instances (or the most similar ones) in PubMed are identified and coupled with their respective country of affiliation, and then probabilistically mapped to a set of 26 predefined ethnicities. The dominant ethnicity (or pair of ethnicities) is assigned as the class. The predictions are also used to upgrade Genni (Smith, Singh, and Torvik, 2013) to provide ethnicity-specific gender predictions for cases like Italian vs. English Andrea, Turkish vs. Korean Bora, Israeli vs. Nordic Eli, and Slavic vs. Japanese Renko. Ethnea and Genni 2.0 are available at http://abel.lis.illinois.edu
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.