Extracting Datasets, Methods, and Projects for ACL Anthology Papers (ODU PURS)

Description/Abstract/Artist Statement

A benefit of the increasingly interconnected world is the amount of information available to pull from, however this also results in an increased volume of noise when trying to find resources related to a particular topic of interest. Resources have been developed over the years to facilitate discovery of previously published research papers containing named entities, such as people, organizations, and locations but in order to find the datasets and methods used in the free text a human must manually read through the entirety of each document. This project develops a framework to automatically extract datasets and methods from scientific papers in the domain of Computer and Information Sciences and Engineering (CISE). We compared a heuristic method and a deep learning-based method, the latter of which was fine-tuned on a pre-trained language model. The ground truth was built by manually annotating a corpus of 500 abstracts of papers selected from the ACL Anthology, which was used for fine-turning the deep learning model and evaluation. The deep learning model plus a classifier outperforms the heuristic model in both simple and complex sentences.

Presenting Author Name/s

Kyle Canady

Faculty Advisor/Mentor

Jian Wu

College Affiliation

College of Sciences

Presentation Type

Oral Presentation

Disciplines

Data Science

Session Title

Interdisciplinary Research #1

Location

Zoom Room A

Start Date

3-20-2021 9:00 AM

End Date

3-20-2021 9:55 AM

This document is currently not available here.

Share

COinS
 
Mar 20th, 9:00 AM Mar 20th, 9:55 AM

Extracting Datasets, Methods, and Projects for ACL Anthology Papers (ODU PURS)

Zoom Room A

A benefit of the increasingly interconnected world is the amount of information available to pull from, however this also results in an increased volume of noise when trying to find resources related to a particular topic of interest. Resources have been developed over the years to facilitate discovery of previously published research papers containing named entities, such as people, organizations, and locations but in order to find the datasets and methods used in the free text a human must manually read through the entirety of each document. This project develops a framework to automatically extract datasets and methods from scientific papers in the domain of Computer and Information Sciences and Engineering (CISE). We compared a heuristic method and a deep learning-based method, the latter of which was fine-tuned on a pre-trained language model. The ground truth was built by manually annotating a corpus of 500 abstracts of papers selected from the ACL Anthology, which was used for fine-turning the deep learning model and evaluation. The deep learning model plus a classifier outperforms the heuristic model in both simple and complex sentences.