Download PDFOpen PDF in browserPredicting Citation Counts with Machine Learning: a Citation Function ApproachEasyChair Preprint 150807 pages•Date: September 26, 2024AbstractThis paper develops a machine learning model to predict the citation counts obtained by research papers. The model uses citation functions, representing the intentions of the paper's author when making citations of previous works, to estimate the number of citations. These intentions can include introducing a research topic, making comparisons, criticizing previous works, etc. Three predictors have been developed based on citation functions: citing sentence, regular sentence, and reference. The prediction is treated as a regression and classification problem by pre-grouping the number of citations into three categories: high-count, medium-count, and low-count. The dataset was obtained from the International Conference on Learning Representations (ICLR) 2017-2020, containing 5,156 accepted and rejected papers. This paper uses only the accepted papers since the main task is to predict the number of citations of accepted/published papers. To obtain the number of citations one year after publication, this paper uses the API provided by Semantic Scholar. According to experiments, the best results in classification reach 98.33% accuracy, and in regression, the results reach 0.3 on both RMSE and MAE. The feature called ‘citing paper dominant,’ representing the superiority of the citing paper over the cited paper, has demonstrated its effectiveness in achieving the best prediction results despite its low distribution in the dataset. In conclusion, citation function-based predictors are effective in estimating the future impact of a paper. Keyphrases: Semantic Scholar, citation count, citation function, machine learning, number of citations
|