Date of Award

Summer 8-2025

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

Program/Concentration

Computer Science

Committee Director

Jian Wu

Committee Member

Michele C. Weigle

Committee Member

Sarah M. Rajtmajer

Committee Member

Lusi Li

Abstract

Scholarly papers are major vehicles for disseminating scientific discoveries, reporting experimental results, and communicating research ideas. A majority of the information in scholarly papers is unstructured data (i.e., text), which is challenging to organize, process, and analyze with computers. Extracting semantic information from textual content in scholarly papers and transforming it into structured or semi-structured data are crucial pre-processing steps for many downstream tasks. Traditionally, information extraction focuses on metadata and references. With the advancement of natural language processing and machine learning, deep learning models have been used to extract semantic information from textual content and use it as a representation of scholarly papers. This dissertation presents a major effort to develop machine learning and deep learning models to extract semantic information from the abstracts and full text of scholarly papers and the application of extracted features on a downstream task in the Science of Science. Specifically, we use machine learning models to extract theory and model entities, scientific claims, and acknowledgment entities, which represent the key semantic information but are hard to extract by heuristic methods. We highlight the technical contributions to overcome challenges, including using distant supervision and supervised contrastive learning to mitigate the data scarcity problem, and using named entity recognition (NER) along with a filter based on linguistic features to extract acknowledgment entities. We further report our efforts on applying extracted features to automatically assess the replicability of social and behavioral sciences papers, as well as to probe influential features in replicability assessment using explainable AI methods.

Rights

In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).

DOI

10.25777/3prm-ya07

ISBN

9798293843534

ORCID

/0000-0001-8423-3044

Available for download on Thursday, September 30, 2027

Share

COinS