Computer Science Faculty Publications

Sec-Lib: Protecting Scholarly Digital Libraries From Infected Papers Using Active Machine Learning Framework

Nir Nissim
Aviad Cohen
Jian Wu, Old Dominion UniversityFollow
Andrea Lanzi
Lior Rokach
Yuval Elovici
Lee Giles

Document Type

Article

Publication Date

2019

DOI

10.1109/access.2019.2933197

Publication Title

IEEE Access

Volume

Pages

110050-110073

Abstract

Researchers from academia and the corporate-sector rely on scholarly digital libraries to access articles. Attackers take advantage of innocent users who consider the articles' files safe and thus open PDF-files with little concern. In addition, researchers consider scholarly libraries a reliable, trusted, and untainted corpus of papers. For these reasons, scholarly digital libraries are an attractive-target and inadvertently support the proliferation of cyber-attacks launched via malicious PDF-files. In this study, we present related vulnerabilities and malware distribution approaches that exploit the vulnerabilities of scholarly digital libraries. We evaluated over two-million scholarly papers in the CiteSeerX library and found the library to be contaminated with a surprisingly large number (0.3-2%) of malicious PDF documents (over 55% were crawled from the IPs of US-universities). We developed a two layered detection framework aimed at enhancing the detection of malicious PDF documents, Sec-Lib, which offers a security solution for large digital libraries. Sec-Lib includes a deterministic layer for detecting known malware, and a machine learning based layer for detecting unknown malware. Our evaluation showed that scholarly digital libraries can detect 96.9% of malware with Sec-Lib, while minimizing the number of PDF-files requiring labeling, and thus reducing the manual inspection efforts of security-experts by 98%.

Comments

This work is licensed under a Creative Commons Attribution 4.0 License.

Original Publication Citation

Nissim, N., Cohen, A., Wu, J., Lanzi, A., Rokach, L., Elovici, Y., & Giles, L. (2019). Sec-Lib: Protecting scholarly digital libraries from infected papers using active machine learning framework. IEEE Access, 7, 110050-110073. doi:10.1109/access.2019.2933197

Repository Citation

ORCID

0000-0003-0173-4463 (Wu)

Download

Included in

Computer Engineering Commons, Information Security Commons

COinS

Computer Science Faculty Publications

Sec-Lib: Protecting Scholarly Digital Libraries From Infected Papers Using Active Machine Learning Framework

Document Type

Publication Date

DOI

Publication Title

Volume

Pages

Abstract

Comments

Original Publication Citation

Repository Citation

ORCID

Included in

Search

Browse

Contribute

Links

Contact Us

Computer Science Faculty Publications

Sec-Lib: Protecting Scholarly Digital Libraries From Infected Papers Using Active Machine Learning Framework

Authors

Document Type

Publication Date

DOI

Publication Title

Volume

Pages

Abstract

Comments

Original Publication Citation

Repository Citation

ORCID

Included in

Share

Search

Browse

Contribute

Links

Contact Us