Abstract/Description/Artist Statement

Science news has become an important vehicle to disseminate scientific breakthroughs, discoveries, and technological innovations. With the advancement of large language models and related AI models, it is possible to automatically generate science news from scientific papers, extending the reader population from domain scientists to a broader scope. However, how to evaluate the quality of the generated news warrants research. Traditional token based metrics have been shown to fail to evaluate the semantics and nuances of science news. Inspired by the fact that a major goal of science news is to educate readers with new knowledge, we thus propose knowledge gain (KGain), a novel metric measured based on the knowledge gained by readers after reading science articles. To validate this metric, one key requirement is alignment with human judgement. To that end, we conduct a human study, calculating the KGain of college students after reading three types of science articles, namely, news articles, scientific abstracts, and social media posts (tweets). We obtained two main outcomes, that science news articles will require less time and result in the same or greater KGain than scientific abstracts, and that scientific news will require more time but results in a much greater knowledge gain than social media posts.The results laid the foundation of building an LLM-as-a-judge model to simulate human evaluators and automatically evaluate the quality of science news.

Presenting Author Name/s

Gabriel Vega Osborne

Faculty Advisor/Mentor

Dr. Jian Wu

Faculty Advisor/Mentor Email

jwu@cs.odu.edu

Faculty Advisor/Mentor Department

Computer Science

College/School Affiliation

College of Sciences

Student Level Group

Undergraduate

Presentation Type

Poster

KGain Recruitment Process.pdf (41 kB)
A brief outline of the recruitment and supervision process of our experimental participants, of which I was a primary contributor as the undergraduate assistant mentioned in that outline.

KGain Project Demographic Exploration.pdf (664 kB)
A brief exploration of the demographic data of our experimental cannidates and participants.

Share

COinS
 

Human Subject Studies for the Alignment of LLM-as-a-Judge Evaluation Metric for Science News

Science news has become an important vehicle to disseminate scientific breakthroughs, discoveries, and technological innovations. With the advancement of large language models and related AI models, it is possible to automatically generate science news from scientific papers, extending the reader population from domain scientists to a broader scope. However, how to evaluate the quality of the generated news warrants research. Traditional token based metrics have been shown to fail to evaluate the semantics and nuances of science news. Inspired by the fact that a major goal of science news is to educate readers with new knowledge, we thus propose knowledge gain (KGain), a novel metric measured based on the knowledge gained by readers after reading science articles. To validate this metric, one key requirement is alignment with human judgement. To that end, we conduct a human study, calculating the KGain of college students after reading three types of science articles, namely, news articles, scientific abstracts, and social media posts (tweets). We obtained two main outcomes, that science news articles will require less time and result in the same or greater KGain than scientific abstracts, and that scientific news will require more time but results in a much greater knowledge gain than social media posts.The results laid the foundation of building an LLM-as-a-judge model to simulate human evaluators and automatically evaluate the quality of science news.