Document Type

Conference Paper

Publication Date

2023

Publication Title

CEUR Workshop Proceedings: EEKE-All2023: Extraction and Evaluation of Knowledge Entities from Scientific Documents (EEKE2023) and AI + Informetrics (All2023): Proceedings of Joint Workshop of the 4th Extraction and Evaluation of Knowledge Entities from Scientific Documents (EEKE2023) and the 3rd AI + Informetrics (All2023) co-located with the JCDL 2023

Volume

3451

Pages

65-77

Conference Name

Joint Workshop of the 4th Extraction and Evaluation of Knowledge Entities from Scientific Documents and the 3rd AI + Informetrics (EEKE-AII2023), June 26, 2023, Santa Fe, New Mexico

Abstract

The growth of scientific papers in the past decades calls for effective claim extraction tools to automatically and accurately locate key claims from unstructured text. Such claims will benefit content-wise aggregated exploration of scientific knowledge beyond the metadata level. One challenge of building such a model is how to effectively use limited labeled training data. In this paper, we compared transfer learning and contrastive learning frameworks in terms of performance, time and training data size. We found contrastive learning has better performance at a lower cost of data across all models. Our contrastive-learning-based model ClaimDistiller has the highest performance, boosting the F1 score of the base models by 3–4%, and achieved an F1=87.45%, improving the state-of-the-art by more than 7% on the same benchmark data previously used for this task. The same phenomenon is observed on another benchmark dataset, and ClaimDistiller consistently has the best performance. Qualitative assessment on a small sample of out-of-domain data indicates that the model generalizes well. Our source codes and datasets can be found here: https://github.com/lamps-lab/sci-claim-distiller.

Comments

Link to proceedings landing page: https://ceur-ws.org/Vol-3451/

Rights

© 2023 the authors.

Use permitted under a Creative Commons License Attribution 4.0 International (CC BY 4.0) License.

Data Availability

Article states: Our source codes and datasets can be found here: https://github.com/lamps-lab/sci-claim-distiller.

Original Publication Citation

Wei, X., Hoque, M. R. U., Wu, J., & Li, J. (2023) ClaimDistiller: Scientific claim extraction with supervised contrastive learning. CEUR Workshop Proceedings: EEKE-All2023: Extraction and Evaluation of Knowledge Entities from Scientific Documents (EEKE2023) and AI + Informetrics (All2023): Proceedings of Joint Workshop of the 4th Extraction and Evaluation of Knowledge Entities from Scientific Documents (EEKE2023) and the 3rd AI + Informetrics (All2023) co-located with the JCDL 2023, 3451, 65-77. https://ceur-ws.org/Vol-3451/paper11.pdf

ORCID

0000-0003-4055-2582 (Hoque), 0000-0003-0173-4463 (Wu), 0000-0003-0091-6986 (Li)

Share

COinS