Document Type
Article
Publication Date
2016
DOI
10.1045/january2016-brunelle
Publication Title
D-Lib Magazine
Volume
22
Issue
1/2
Pages
1-11
Abstract
In this work, we present a case study in which we investigate using open-source, web-scale web archiving tools (i.e., Heritrix and the Wayback Machine installed on the MITRE Intranet) to automatically archive a corporate Intranet. We use this case study to outline the challenges of Intranet web archiving, identify situations in which the open source tools are not well suited for the needs of the corporate archivists, and make recommendations for future corporate archivists wishing to use such tools. We performed a crawl of 143,268 URIs (125 GB and 25 hours) to demonstrate that the crawlers are easy to set up, efficiently crawl the Intranet, and improve archive management. However, challenges exist when the Intranet contains sensitive information, areas with potential archival value require user credentials, or archival targets make extensive use of internally developed and customized web services. We elaborate on and recommend approaches for overcoming these challenges. [ABSTRACT FROM AUTHOR]
Original Publication Citation
Brunelle, J. F., et al. (2016). "Leveraging Heritrix and the Wayback Machine on a corporate intranet: A case study on improving corporate archives." D-Lib Magazine 22(1/2): 1-11. 10.1045/january2016-brunelle
Repository Citation
Brunelle, J. F., et al. (2016). "Leveraging Heritrix and the Wayback Machine on a corporate intranet: A case study on improving corporate archives." D-Lib Magazine 22(1/2): 1-11. 10.1045/january2016-brunelle
ORCID
0000-0002-2787-7166 (Weigle), 0000-0003-3749-8116 (Nelson)