Interdisciplinary Research

Quality Assessment of Scholarly Big Data

Ryan Hiltabrand, Old Dominion UniversityFollow

Description/Abstract/Artist Statement

Scholarly big data is the rapid growth of scholarly data placed into digital networks and libraries. Some of the data associated with this research includes Scholarly Open Research Corpus data, Microsoft Academic Graph, and the US National Library of Medicine. These all use automated information extraction tools to collect metadata from scholarly articles. This automation introduces many sources of error due to the imperfections of models in extraction libraries. These various libraries are used for many areas in analytical research like citation analysis, citation prediction, information extraction, and link analysis. This research will come from the use of metadata provided by Semantic Scholar Open Research Corpus (S2ORC) which is compared to a ground truth dataset that is focused on assessing the data quality including document conflation (near-duplicate identification), paper linkage, author name disambiguation, coverage, and freshness. We found that the data linking of S2 quality is high but not perfect. The accuracies range from 0.91 to 0.99 depending on subject domains and data curation methods. Given that there are 200 million paper records in S2, data users should take this into account when performing data coverage and network analysis between S2 and other databases.

Presenting Author Name/s

Ryan Hiltabrand

Faculty Advisor/Mentor

Jian Wu

College Affiliation

College of Sciences

Presentation Type

Poster

Disciplines

Databases and Information Systems | Data Science

Session Title

Interdisciplinary Research #8

Location

Zoom Room HH

Start Date

3-20-2021 3:00 PM

End Date

3-20-2021 3:55 PM

This document is currently not available here.

COinS

Mar 20th, 3:00 PM Mar 20th, 3:55 PM

Quality Assessment of Scholarly Big Data

Zoom Room HH

Interdisciplinary Research

Quality Assessment of Scholarly Big Data

Description/Abstract/Artist Statement

Presenting Author Name/s

Faculty Advisor/Mentor

College Affiliation

Presentation Type

Disciplines

Session Title

Location

Start Date

End Date

Search

Browse

Contribute

Links

Contact Us

Interdisciplinary Research

Quality Assessment of Scholarly Big Data

Author Information

Description/Abstract/Artist Statement

Presenting Author Name/s

Faculty Advisor/Mentor

College Affiliation

Presentation Type

Disciplines

Session Title

Location

Start Date

End Date

Share

Search

Browse

Contribute

Links

Contact Us