Download PDFOpen PDF in browser

Predicting Citation Counts with Machine Learning: a Citation Function Approach

EasyChair Preprint 15080

7 pagesDate: September 26, 2024

Abstract

This paper develops a machine learning model to predict the citation counts obtained by research papers. The model uses citation functions, representing the intentions of the paper's author when making citations of previous works, to estimate the number of citations. These intentions can include introducing a research topic, making comparisons, criticizing previous works, etc. Three predictors have been developed based on citation functions: citing sentence, regular sentence, and reference. The prediction is treated as a regression and classification problem by pre-grouping the number of citations into three categories: high-count, medium-count, and low-count. The dataset was obtained from the International Conference on Learning Representations (ICLR) 2017-2020, containing 5,156 accepted and rejected papers. This paper uses only the accepted papers since the main task is to predict the number of citations of accepted/published papers. To obtain the number of citations one year after publication, this paper uses the API provided by Semantic Scholar. According to experiments, the best results in classification reach 98.33% accuracy, and in regression, the results reach 0.3 on both RMSE and MAE. The feature called ‘citing paper dominant,’ representing the superiority of the citing paper over the cited paper, has demonstrated its effectiveness in achieving the best prediction results despite its low distribution in the dataset. In conclusion, citation function-based predictors are effective in estimating the future impact of a paper.

Keyphrases: Semantic Scholar, citation count, citation function, machine learning, number of citations

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:15080,
  author    = {Setio Basuki and Zamah Sari and Rizky Indrabayu and Reza Fauzan and Aulia Arif Wardhana and Masatoshi Tsuchiya},
  title     = {Predicting Citation Counts with Machine Learning: a Citation Function Approach},
  howpublished = {EasyChair Preprint 15080},
  year      = {EasyChair, 2024}}
Download PDFOpen PDF in browser