The Robustness of Sequence Tagging and Transformer Models in Leaderboard Entity Extraction from Scholarly Papers

Description/Abstract/Artist Statement

Automatic extraction of leaderboard entities, such as datasets, methods, and evaluation metrics, is useful for keeping track of the evolution of algorithms and models proposed for certain tasks. Sequence tagging and transformer models have been proposed and exhibited outstanding performance on named entity extraction. In this project, we investigate the performances of these two models in the task of leaderboard entity extraction, focusing on datasets and methods from research papers in Computer and Information Sciences and Engineering. We compare sequence tagging and transformer models trained on in-house and published datasets. The results exhibited large discrepancies between performances obtained by models trained on different datasets. In general, transformer-based models perform better than sequence-tagging models. The discrepancies between transformer models trained on different datasets were smaller than sequence tagging models, indicating that the transformer model seems less susceptible to the changes of entity annotation schemas. Further experiments on more datasets are needed to verify this.

Presenting Author Name/s

Leeshi Lin

Faculty Advisor/Mentor

Jian Wu

College Affiliation

College of Sciences

Presentation Type

Poster

Disciplines

Artificial Intelligence and Robotics | Databases and Information Systems | Data Science | Other Computer Sciences

Session Title

Poster Session

Location

Learning Commons @ Perry Library

Start Date

3-19-2022 9:00 AM

End Date

3-19-2022 11:00 AM

This document is currently not available here.

Share

COinS
 
Mar 19th, 9:00 AM Mar 19th, 11:00 AM

The Robustness of Sequence Tagging and Transformer Models in Leaderboard Entity Extraction from Scholarly Papers

Learning Commons @ Perry Library

Automatic extraction of leaderboard entities, such as datasets, methods, and evaluation metrics, is useful for keeping track of the evolution of algorithms and models proposed for certain tasks. Sequence tagging and transformer models have been proposed and exhibited outstanding performance on named entity extraction. In this project, we investigate the performances of these two models in the task of leaderboard entity extraction, focusing on datasets and methods from research papers in Computer and Information Sciences and Engineering. We compare sequence tagging and transformer models trained on in-house and published datasets. The results exhibited large discrepancies between performances obtained by models trained on different datasets. In general, transformer-based models perform better than sequence-tagging models. The discrepancies between transformer models trained on different datasets were smaller than sequence tagging models, indicating that the transformer model seems less susceptible to the changes of entity annotation schemas. Further experiments on more datasets are needed to verify this.