Searching for Evidence of Scientific News in Scholarly Big Data

Document Type


Publication Date




Publication Title

Proceedings of the 10th International Conference on Knowledge Capture



Conference Name

K-CAP '19: The 10th International Conference on Knowledge Capture, November 19-21, 2019, Marina Del Ray, California


Public digital media can often mix factual information with fake scientific news, which is typically difficult to pinpoint, especially for non-professionals. These scientific news articles create illusions and misconceptions, thus ultimately influence the public opinion, with serious consequences at a broader social scale. Yet, existing solutions aiming at automatically verifying the credibility of news articles are still unsatisfactory. We propose to verify scientific news by retrieving and analyzing its most relevant source papers from an academic digital library (DL), e.g., arXiv. Instead of querying keywords or regular named entities extracted from news articles, we query domain knowledge entities (DKEs) extracted from the text. By querying each DKE, we retrieve a list of candidate scholarly papers. We then design a function to rank them and select the most relevant scholarly paper. After exploring various representations, experiments indicate that the term frequency-inverse document frequency (TF-IDF) representation with cosine similarity outperforms baseline models based on word embedding. This result demonstrates the efficacy of using DKEs to retrieve scientific papers which are relevant to a specific news article. It also indicates that word embedding may not be the best document representation for domain specific document retrieval tasks. Our method is fully automated and can be effectively applied to facilitating fake and misinformed news detection across many scientific domains.


© 2019 Association for Computing Machinery. This is the authors' version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of record was published in Proceedings of the 10th International Conference on Knowledge Capture,

"ACM treats links as citations (references to objects) rather than as incorporations (embedding of objects). Permission is not needed to create links to citations in The ACM Digital Library or Online Guide to Computing Literature. ACM encourages the widespread distribution of links to the definitive Version of Records of its copyrighted works in the ACM Digital Library and does not require that authors obtain prior permission to include such links in their new works."

Link included in accordance with publisher policy.

Original Publication Citation

Hoque, M. R. U., Bradley, D., Kwan, C., Chiatti, A., Li, J., & Wu, J. (2019) Searching for evidence of scientific news in scholarly big data. In Proceedings of the 10th International Conference on Knowledge Capture (pp. 251-254) Association for Computing Machinery.


0000-0003-4055-2582 (Hoque), 0000-0003-0091-6986 (Li), 0000-0003-0173-4463 (Wu)