Virtual Citation Proximity (VCP): Calculating Co-Citation-Proximity-Based Document Relatedness for Uncited Documents with Machine Learning
Citation:
Beel, J., Virtual Citation Proximity (VCP): Calculating Co-Citation-Proximity-Based Document Relatedness for Uncited Documents with Machine Learning, 2017Download Item:
Abstract:
The relatedness of research articles, patents, legal documents, web pages, and other documents is often calculated with citation or hyperlink based approaches such as citation proximity analysis (CPA). In contrast to text-based document similarity, citation-based relatedness coversa broader range of relatedness.However, citation-based approaches suffer from the many documents that receive little or no citations, and for which document relatedness hence cannot be calculated. I propose to calculatea machine-learned ‘virtual citation proximity’(or 'virtual hyperlink proximity')that could be calculated for all documents for which textual information (title, abstract ...) and metadata (authors, journal name ...) is available. The input to the machine learning algorithm would be a large corpus of documents, for which textual information, metadata and citation proximity is available. The citation proximity would serve as ground truth, and the machine-learning algorithm would infer, which textual features correspond to a high proximity of co-citations. After the training phase, the machine-learning algorithm could calculate a virtual citation proximity even for uncited documents.This virtual citation proximity would express in what proximity two documents would likely be cited,if they were cited. The virtual citation proximity then could be used in the same way as “real”citation proximity to calculate document relatedness, and would potentially cover a wider range of relatedness than text-based document relatedness.
Author's Homepage:
http://people.tcd.ie/beeljDescription:
PUBLISHED
Author: Beel, Joeran
Type of material:
Working PaperCollections
Availability:
Full text availableKeywords:
Document relatedness, Citation analysis, Citation proximity analysis, Digital libraries, Recommender systems, Search enginesSubject (TCD):
INFORMATION-RETRIEVAL , MACHINE LEARNINGMetadata
Show full item recordLicences: