Date of Award

Spring 2024

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Engineering Management & Systems Engineering

Program/Concentration

Engineering Management and Systems Engineering

Committee Director

James D. Moreland, Jr.

Committee Member

Saikou Y. Diallo

Committee Member

Andrew J. Collins

Abstract

The process of extracting structured data from unstructured and semi-structured text is manual, time consuming and error prone. Current natural language processing approaches for automating this process are difficult to verify for non-trivial and context-sensitive corpora. Large Language Models (LLMs) like ChatGPT have become a subject of considerable interest, opening a promising avenue of exploration. However, there is limited evidence on the performance of LLMs for information extraction.

In this dissertation, an approach is proposed to evaluate the accuracy of Stanford OpenIE and OpenAI's ChatGPT for this purpose. This includes comparing Resource Description Framework (RDF) triples extracted by each of these semi-automated methods to hand extracted triples. For identified discrepancies and/or noteworthy extractions, qualitative indicators were collected, analyzed, and discussed. The F2 score, a measure of accuracy for each method with a weighting for recall, was calculated by combining recall and precision performance metrics.

Results show that ChatGPT correctly identified manually extracted RDF triples with no statistical difference and was found to have an F2 score of 95.9%. OpenIE was found to have an F2 score of 20.7%. While LLMs still require human verification, this research has demonstrated that LLMs show an improvement over the state of the art.

Rights

In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).

DOI

10.25777/pyyz-j671

ISBN

9798382770529

Share

COinS