Document Type

Conference Paper

Publication Date

2020

Publication Title

Proceedings of the 8th International Workshop on Mining Scientific Publications

Pages

21-26

Conference Name

8th International Workshop on Mining Scientific Publications, August 5, 2020, Wuhan, Hubei, China

Abstract

We introduce SmartCiteCon (SCC), a Java API for extracting both explicit and implicit citation context from academic literature in English. The tool is built on a Support Vector Machine (SVM) model trained on a set of 7,058 manually annotated citation context sentences, curated from 34,000 papers in the ACL Anthology. The model with 19 features achieves F1=85.6%. SCC supports PDF, XML, and JSON files out-of-box, provided that they are conformed to certain schemas. The API supports single document processing and batch processing in parallel. It takes about 12–45 seconds on average depending on the format to process a document on a dedicated server with 6 multithreaded cores. Using SCC, we extracted 11.8 million citation context sentences from ∼33.3k PMC papers in the CORD19 dataset, released on June 13, 2020. The source code is released at https://gitee.com/irlab/SmartCiteCon.

Rights

© 2020 ACL.

"Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International (CC BY 4.0) License."

Original Publication Citation

Guo, C., Cui, H., Zhang, L., Wang, J., Lu, W., & Wu, J. (2020). SmartCiteCon: Implicit citation context extraction from academic literature using supervised learning. In Proceedings of the 8th International Workshop on Mining Scientific Publications (pp. 21-26). Wuhan, China. Association for Computational Linguistics. https://aclanthology.org/2020.wosp-1.3

ORCID

0000-0003-0173-4463 (Wu)

Share

COinS