Document Type

Conference Paper

Publication Date

2012

DOI

10.1145/2232817.2232930

Publication Title

JCDL '12: Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries

Pages

437-438

Conference Name

12th ACM/IEEE-CS Joint Conference on Digital Libraries, Washington, DC June 10-14, 2012

Abstract

The Internet Archive's Wayback Machine is the most common way that typical users interact with web archives. The Internet Archive uses the Heritrix web crawler to transform pages on the publicly available web into Web ARChive (WARC) files, which can then be accessed using the Wayback Machine. Because Heritrix can only access the publicly available web, many personal pages (e.g. password-protected pages, social media pages) cannot be easily archived into the standard WARC format. We have created a Google Chrome extension, WARCreate, that allows a user to create a WARC file from any webpage. Using this tool, content that might have been otherwise lost in time can be archived in a standard format by any user. This tool provides a way for casual users to easily create archives of personal online content. This is one of the first steps in resolving issues of "long term storage, maintenance, and access of personal digital assets that have emotional, intellectual, and historical value to individuals".

Comments

© by the author/owners.

Included with the kind permission of the author.

Original Publication Citation

Kelly, M., & Weigle, M. C. (2012). WARCreate: Create wayback-consumable WARC files from any webpage. Paper presented at the 12th ACM/IEEE-CS Joint Conference on Digital Libraries, Washington, DC.

ORCID

0000-0002-2787-7166 (Weigle)

Share

COinS