The Robustness of Sequence Tagging and Transformer Models in Leaderboard Entity Extraction from Scholarly Papers
Description/Abstract/Artist Statement
Automatic extraction of leaderboard entities, such as datasets, methods, and evaluation metrics, is useful for keeping track of the evolution of algorithms and models proposed for certain tasks. Sequence tagging and transformer models have been proposed and exhibited outstanding performance on named entity extraction. In this project, we investigate the performances of these two models in the task of leaderboard entity extraction, focusing on datasets and methods from research papers in Computer and Information Sciences and Engineering. We compare sequence tagging and transformer models trained on in-house and published datasets. The results exhibited large discrepancies between performances obtained by models trained on different datasets. In general, transformer-based models perform better than sequence-tagging models. The discrepancies between transformer models trained on different datasets were smaller than sequence tagging models, indicating that the transformer model seems less susceptible to the changes of entity annotation schemas. Further experiments on more datasets are needed to verify this.
Faculty Advisor/Mentor
Jian Wu
College Affiliation
College of Sciences
Presentation Type
Poster
Disciplines
Artificial Intelligence and Robotics | Databases and Information Systems | Data Science | Other Computer Sciences
Session Title
Poster Session
Location
Learning Commons @ Perry Library
Start Date
3-19-2022 9:00 AM
End Date
3-19-2022 11:00 AM
The Robustness of Sequence Tagging and Transformer Models in Leaderboard Entity Extraction from Scholarly Papers
Learning Commons @ Perry Library
Automatic extraction of leaderboard entities, such as datasets, methods, and evaluation metrics, is useful for keeping track of the evolution of algorithms and models proposed for certain tasks. Sequence tagging and transformer models have been proposed and exhibited outstanding performance on named entity extraction. In this project, we investigate the performances of these two models in the task of leaderboard entity extraction, focusing on datasets and methods from research papers in Computer and Information Sciences and Engineering. We compare sequence tagging and transformer models trained on in-house and published datasets. The results exhibited large discrepancies between performances obtained by models trained on different datasets. In general, transformer-based models perform better than sequence-tagging models. The discrepancies between transformer models trained on different datasets were smaller than sequence tagging models, indicating that the transformer model seems less susceptible to the changes of entity annotation schemas. Further experiments on more datasets are needed to verify this.