Document Type
Conference Paper
Publication Date
2022
DOI
10.1145/3558100.3563855
Publication Title
DocEng '22: Proceedings of the 22nd ACM Symposium on Document Engineering
Pages
7 (1-4)
Conference Name
Doc Eng '22, September 20-23, 2022, Virtual, California.
Abstract
Theories and models, which are common in scientific papers in almost all domains, usually provide the foundations of theoretical analysis and experiments. Understanding the use of theories and models can shed light on the credibility and reproducibility of research works. Compared with metadata, such as title, author, keywords, etc., theory extraction in scientific literature is rarely explored, especially for social and behavioral science (SBS) domains. One challenge of applying supervised learning methods is the lack of a large number of labeled samples for training. In this paper, we propose an automated framework based on distant supervision that leverages entity mentions from Wikipedia to build a ground truth corpus consisting of more than 4500 automatically annotated sentences containing theory/model mentions. We use this corpus to train models for theory extraction in SBS papers. We compared four deep learning architectures and found the RoBERTa-BiLSTM-CRF is the best one with a precision as high as 89.72%. The model is promising to be conveniently extended to domains other than SBS. The code and data are publicly available at https://github.com/lamps-lab/theory.
Rights
© 2022 The Owner/Authors
This work is licensed under a Creative Commons Attribution International 4.0 License (CC BY 4.0).
Data Availability
Article states: The code and data are publicly available at: https://github.com/lamps-lab/theory
Original Publication Citation
Wei, X., Salsabil, L., & Wu, J. (2022). Theory entity extraction for social and behavioral sciences papers using distant supervision. In DocEng '22: Proceedings of the 22nd ACM Symposium on Document Engineering (7). Association for Computing Machinery. https://doi.org/10.1145/3558100.3563855
Repository Citation
Wei, X., Salsabil, L., & Wu, J. (2022). Theory entity extraction for social and behavioral sciences papers using distant supervision. In DocEng '22: Proceedings of the 22nd ACM Symposium on Document Engineering (7). Association for Computing Machinery. https://doi.org/10.1145/3558100.3563855
ORCID
0000-0002-6162-2896 (Salsabil), 0000-0003-0173-4463 (Wu)