Abstract/Description/Artist Statement
Science news has become an important vehicle to disseminate scientific breakthroughs, discoveries, and technological innovations. With the advancement of large language models and related AI models, it is possible to automatically generate science news from scientific papers, extending the reader population from domain scientists to a broader scope. However, how to evaluate the quality of the generated news warrants research. Traditional token based metrics have been shown to fail to evaluate the semantics and nuances of science news. Inspired by the fact that a major goal of science news is to educate readers with new knowledge, we thus propose knowledge gain (KGain), a novel metric measured based on the knowledge gained by readers after reading science articles. To validate this metric, one key requirement is alignment with human judgement. To that end, we conduct a human study, calculating the KGain of college students after reading three types of science articles, namely, news articles, scientific abstracts, and social media posts (tweets). We obtained two main outcomes, that science news articles will require less time and result in the same or greater KGain than scientific abstracts, and that scientific news will require more time but results in a much greater knowledge gain than social media posts.The results laid the foundation of building an LLM-as-a-judge model to simulate human evaluators and automatically evaluate the quality of science news.
Faculty Advisor/Mentor
Dr. Jian Wu
Faculty Advisor/Mentor Email
jwu@cs.odu.edu
Faculty Advisor/Mentor Department
Computer Science
College/School Affiliation
College of Sciences
Student Level Group
Undergraduate
Presentation Type
Poster
A brief outline of the recruitment and supervision process of our experimental participants, of which I was a primary contributor as the undergraduate assistant mentioned in that outline.
KGain Project Demographic Exploration.pdf (664 kB)
A brief exploration of the demographic data of our experimental cannidates and participants.
Included in
Artificial Intelligence and Robotics Commons, Databases and Information Systems Commons, Data Science Commons
Human Subject Studies for the Alignment of LLM-as-a-Judge Evaluation Metric for Science News
Science news has become an important vehicle to disseminate scientific breakthroughs, discoveries, and technological innovations. With the advancement of large language models and related AI models, it is possible to automatically generate science news from scientific papers, extending the reader population from domain scientists to a broader scope. However, how to evaluate the quality of the generated news warrants research. Traditional token based metrics have been shown to fail to evaluate the semantics and nuances of science news. Inspired by the fact that a major goal of science news is to educate readers with new knowledge, we thus propose knowledge gain (KGain), a novel metric measured based on the knowledge gained by readers after reading science articles. To validate this metric, one key requirement is alignment with human judgement. To that end, we conduct a human study, calculating the KGain of college students after reading three types of science articles, namely, news articles, scientific abstracts, and social media posts (tweets). We obtained two main outcomes, that science news articles will require less time and result in the same or greater KGain than scientific abstracts, and that scientific news will require more time but results in a much greater knowledge gain than social media posts.The results laid the foundation of building an LLM-as-a-judge model to simulate human evaluators and automatically evaluate the quality of science news.