The Potential of Large Language Models in Evaluating Scientific Claims
Description/Abstract/Artist Statement
The increase of disinformation in scientific news across a variety of domains has generated an urgency for a robust and generalizable approach to automated scientific claim verification (SCV). Available methods of SCV are limited in either domain adaptability or scalability. To facilitate building and evaluating more robust models on SCV we propose MSVEC, a multidomain dataset containing 200 pairs of verified scientific news claims with evidence research papers. To understand the capability of large language models on the SCV task, we evaluated GPT-3.5 against MSVEC. While methods of fact-checking exist for specific domains (e.g., political and health), the use of large language models exhibits better generalizability across multiple domains and is potentially compared with state-of-the-art models based on word embeddings. The data and software used and developed for this project are available at https://github.com/lamps-lab/msvec.
Faculty Advisor/Mentor
Jian Wu
Faculty Advisor/Mentor Department
Computer Science
College Affiliation
College of Sciences
Presentation Type
Oral Presentation
Disciplines
Artificial Intelligence and Robotics | Computer Sciences
Session Title
College of Sciences 2
Location
Learning Commons @Perry Library, Room 1311
Start Date
3-30-2024 9:30 AM
End Date
3-30-2024 10:30 AM
The Potential of Large Language Models in Evaluating Scientific Claims
Learning Commons @Perry Library, Room 1311
The increase of disinformation in scientific news across a variety of domains has generated an urgency for a robust and generalizable approach to automated scientific claim verification (SCV). Available methods of SCV are limited in either domain adaptability or scalability. To facilitate building and evaluating more robust models on SCV we propose MSVEC, a multidomain dataset containing 200 pairs of verified scientific news claims with evidence research papers. To understand the capability of large language models on the SCV task, we evaluated GPT-3.5 against MSVEC. While methods of fact-checking exist for specific domains (e.g., political and health), the use of large language models exhibits better generalizability across multiple domains and is potentially compared with state-of-the-art models based on word embeddings. The data and software used and developed for this project are available at https://github.com/lamps-lab/msvec.