Document Type

Article

Publication Date

2016

DOI

10.1045/january2016-brunelle

Publication Title

D-Lib Magazine

Volume

22

Issue

1/2

Pages

1-11

Abstract

In this work, we present a case study in which we investigate using open-source, web-scale web archiving tools (i.e., Heritrix and the Wayback Machine installed on the MITRE Intranet) to automatically archive a corporate Intranet. We use this case study to outline the challenges of Intranet web archiving, identify situations in which the open source tools are not well suited for the needs of the corporate archivists, and make recommendations for future corporate archivists wishing to use such tools. We performed a crawl of 143,268 URIs (125 GB and 25 hours) to demonstrate that the crawlers are easy to set up, efficiently crawl the Intranet, and improve archive management. However, challenges exist when the Intranet contains sensitive information, areas with potential archival value require user credentials, or archival targets make extensive use of internally developed and customized web services. We elaborate on and recommend approaches for overcoming these challenges. [ABSTRACT FROM AUTHOR]

Original Publication Citation

Brunelle, J. F., et al. (2016). "Leveraging Heritrix and the Wayback Machine on a corporate intranet: A case study on improving corporate archives." D-Lib Magazine 22(1/2): 1-11. 10.1045/january2016-brunelle

ORCID

0000-0002-2787-7166 (Weigle), 0000-0003-3749-8116 (Nelson)

Share

COinS