Withdraw
Loading…
Patci — a tool for identifying scientific articles cited by patents
Agarwal, Sneha; Lincoln, Miles; Cai, Haoyan; Torvik, Vetle I.
Content Files

Loading…
Download Files
Loading…
Download Counts (All Files)
Loading…
Edit File
Loading…
Permalink
https://hdl.handle.net/2142/54885
Description
- Title
- Patci — a tool for identifying scientific articles cited by patents
- Author(s)
- Agarwal, Sneha
- Lincoln, Miles
- Cai, Haoyan
- Torvik, Vetle I.
- Issue Date
- 2014-03-14
- Keyword(s)
- citation matcher
- USPTO Patents
- PubMed
- DBLP
- probabilistic matching
- bibliographic databases
- patent-to-paper citations
- Date of Ingest
- 2014-09-24T05:49:47Z
- Abstract
- Scientific research increasingly drives innovation and development of new technologies, and patent-to-paper citations can be used to trace this diffusion of knowledge and measure these science-to-technology spillover effects . However, the so-called “non-patent citations” in USPTO records do not contain authoritative identifiers, nor do they adhere to a standard format. They are strings written in free-form, often much too free, which makes it harder to systematically identify the articles or pieces of work cited. Here, we introduce Patci -- a tool that takes a citation string and probabilistically identifies matching records from a set of bibliographic databases. It currently permits matching to biomedical literature (21.5M PubMed records) and computing/information sciences literature (3.2M DBLP records). It uses a probabilistic model trained on USPTO records but works well for citations originating from outside the patenting sphere. The algorithm extracts and weighs several hundred predictive features and does not rely on punctuation as delimiters of fields. A match probability as attached to each source link ID (e.g., PMID) which permits setting application-appropriate level of match stringency and permits sensitivity analysis. All 16M citations listed in granted USPTO patents (1975-present) have been processed and is available as a separate dataset.
- Publisher
- GSLIS Research Showcase
- Type of Resource
- other
- Genre of Resource
- Conference Poster
- Language
- en
- Permalink
- http://hdl.handle.net/2142/54885
- Sponsor(s)/Grant Number(s)
- National Institute on Aging of the NIH (Award Number P01AG039347)
- Science of Science and Innovation Policy program of the NSF (Award Number 0965341)
Owning Collections
Student Publications and Research - Information Sciences PRIMARY
Publications, conference papers, and other research and scholarship of iSchool students.Manage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…