Document Type
Conference Paper
Publication Date
2023
Publication Title
CEUR Workshop Proceedings: EEKE-All2023: Extraction and Evaluation of Knowledge Entities from Scientific Documents (EEKE2023) and AI + Informetrics (All2023): Proceedings of Joint Workshop of the 4th Extraction and Evaluation of Knowledge Entities from Scientific Documents (EEKE2023) and the 3rd AI + Informetrics (All2023) co-located with the JCDL 2023
Volume
3451
Pages
65-77
Conference Name
Joint Workshop of the 4th Extraction and Evaluation of Knowledge Entities from Scientific Documents and the 3rd AI + Informetrics (EEKE-AII2023), June 26, 2023, Santa Fe, New Mexico
Abstract
The growth of scientific papers in the past decades calls for effective claim extraction tools to automatically and accurately locate key claims from unstructured text. Such claims will benefit content-wise aggregated exploration of scientific knowledge beyond the metadata level. One challenge of building such a model is how to effectively use limited labeled training data. In this paper, we compared transfer learning and contrastive learning frameworks in terms of performance, time and training data size. We found contrastive learning has better performance at a lower cost of data across all models. Our contrastive-learning-based model ClaimDistiller has the highest performance, boosting the F1 score of the base models by 3–4%, and achieved an F1=87.45%, improving the state-of-the-art by more than 7% on the same benchmark data previously used for this task. The same phenomenon is observed on another benchmark dataset, and ClaimDistiller consistently has the best performance. Qualitative assessment on a small sample of out-of-domain data indicates that the model generalizes well. Our source codes and datasets can be found here: https://github.com/lamps-lab/sci-claim-distiller.
Rights
© 2023 the authors.
Use permitted under a Creative Commons License Attribution 4.0 International (CC BY 4.0) License.
Data Availability
Article states: Our source codes and datasets can be found here: https://github.com/lamps-lab/sci-claim-distiller.
Original Publication Citation
Wei, X., Hoque, M. R. U., Wu, J., & Li, J. (2023) ClaimDistiller: Scientific claim extraction with supervised contrastive learning. CEUR Workshop Proceedings: EEKE-All2023: Extraction and Evaluation of Knowledge Entities from Scientific Documents (EEKE2023) and AI + Informetrics (All2023): Proceedings of Joint Workshop of the 4th Extraction and Evaluation of Knowledge Entities from Scientific Documents (EEKE2023) and the 3rd AI + Informetrics (All2023) co-located with the JCDL 2023, 3451, 65-77. https://ceur-ws.org/Vol-3451/paper11.pdf
Repository Citation
Wei, X., Hoque, M. R. U., Wu, J., & Li, J. (2023) ClaimDistiller: Scientific claim extraction with supervised contrastive learning. CEUR Workshop Proceedings: EEKE-All2023: Extraction and Evaluation of Knowledge Entities from Scientific Documents (EEKE2023) and AI + Informetrics (All2023): Proceedings of Joint Workshop of the 4th Extraction and Evaluation of Knowledge Entities from Scientific Documents (EEKE2023) and the 3rd AI + Informetrics (All2023) co-located with the JCDL 2023, 3451, 65-77. https://ceur-ws.org/Vol-3451/paper11.pdf
ORCID
0000-0003-4055-2582 (Hoque), 0000-0003-0173-4463 (Wu), 0000-0003-0091-6986 (Li)
Included in
Artificial Intelligence and Robotics Commons, Numerical Analysis and Scientific Computing Commons, Scholarly Communication Commons
Comments
Link to proceedings landing page: https://ceur-ws.org/Vol-3451/