The Potential of Large Language Models in Evaluating Scientific Claims

Description/Abstract/Artist Statement

The increase of disinformation in scientific news across a variety of domains has generated an urgency for a robust and generalizable approach to automated scientific claim verification (SCV). Available methods of SCV are limited in either domain adaptability or scalability. To facilitate building and evaluating more robust models on SCV we propose MSVEC, a multidomain dataset containing 200 pairs of verified scientific news claims with evidence research papers. To understand the capability of large language models on the SCV task, we evaluated GPT-3.5 against MSVEC. While methods of fact-checking exist for specific domains (e.g., political and health), the use of large language models exhibits better generalizability across multiple domains and is potentially compared with state-of-the-art models based on word embeddings. The data and software used and developed for this project are available at https://github.com/lamps-lab/msvec.

Presenting Author Name/s

Michael Evans

Faculty Advisor/Mentor

Jian Wu

Faculty Advisor/Mentor Department

Computer Science

College Affiliation

College of Sciences

Presentation Type

Oral Presentation

Disciplines

Artificial Intelligence and Robotics | Computer Sciences

Session Title

College of Sciences 2

Location

Learning Commons @Perry Library, Room 1311

Start Date

3-30-2024 9:30 AM

End Date

3-30-2024 10:30 AM

This document is currently not available here.

Share

COinS
 
Mar 30th, 9:30 AM Mar 30th, 10:30 AM

The Potential of Large Language Models in Evaluating Scientific Claims

Learning Commons @Perry Library, Room 1311

The increase of disinformation in scientific news across a variety of domains has generated an urgency for a robust and generalizable approach to automated scientific claim verification (SCV). Available methods of SCV are limited in either domain adaptability or scalability. To facilitate building and evaluating more robust models on SCV we propose MSVEC, a multidomain dataset containing 200 pairs of verified scientific news claims with evidence research papers. To understand the capability of large language models on the SCV task, we evaluated GPT-3.5 against MSVEC. While methods of fact-checking exist for specific domains (e.g., political and health), the use of large language models exhibits better generalizability across multiple domains and is potentially compared with state-of-the-art models based on word embeddings. The data and software used and developed for this project are available at https://github.com/lamps-lab/msvec.