Date of Award

Fall 12-2025

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

Program/Concentration

Computer Science

Committee Director

Jian Wu

Committee Member

Michael L. Nelson

Committee Member

Michele C. Weigle

Committee Member

Sampath Jayarathna

Committee Member

Yi He

Abstract

Scientific tables report critical research insights, data, and findings for scientific progress. Because Portable Document Format (PDF) is the de facto standard format for scientific paper publishing, there has been an emerging need for an automatic method to extract data from PDF files. A significant fraction of scientific tables exhibit complex structure and content, making it challenging for machine learning tools to accurately extract the content directly from PDF files. Despite the advancements in Table Structure Recognition (TSR), automated extraction of data from complex scientific tables remains a challenge due to variations in table structures and contents. In this dissertation, we developed SCITEUQ, a software framework to address these challenges by enabling automated, accurate, and uncertainty-aware extraction of data from complex scientific tables in multiple disciplines. By integrating TSR with Optical Character Recognition and Uncertainty Quantification, SCITEUQ aims at significantly improving the quality of data extraction while significantly reducing the workload of humans to verify extracted data. We also developed SciTableQA, a benchmark for evaluating the question-answer and reasoning capabilities of Large Language Models on complex scientific tables. This research advances the fields of information extraction for complex scientific tables, which will potentially benefit scientific data compilation in a wide range of scientific domains.

Rights

In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).

DOI

10.25777/wc7x-s023

ORCID

0000-0002-5124-0739

Share

COinS