Document Type
Conference Paper
Publication Date
2017
Pages
1-43
Conference Name
2017 ACM/IEEE Joint Conference on Digital Libraries
Abstract
Quantifying the captures of a URI over time is useful for researchers to identify the extent to which a Web page has been archived. Memento TimeMaps provide a format to list mementos (URI-Ms) for captures along with brief metadata, like Memento-Datetime, for each URI-M. However, when some URI-Ms are dereferenced, they simply provide a redirect to a different URI-M (instead of a unique representation at the datetime), often also present in the TimeMap. This infers that confidently obtaining an accurate count quantifying the number of non-forwarding captures for a URI-R is not possible using a TimeMap alone and that the magnitude of a TimeMap is not equivalent to the number of representations it identifies. In this work we discuss this particular phenomena in depth. We also perform a breakdown of the dynamics of counting mementos for a particular URI-R (google.com) and quantify the prevalence of the various canonicalization patterns that exacerbate attempts at counting using only a TimeMap. For google.com we found that 84.9% of the URI-Ms result in an HTTP redirect when dereferenced. We expand on and apply this metric to TimeMaps for seven other URI-Rs of large Web sites and thirteen academic institutions. Using a ratio metric DI for the number of URI-Ms without redirects to those requiring a redirect when dereferenced, five of the eight large web sites' and two of the thirteen academic institutions' TimeMaps had a ratio of ratio less than one, indicating that more than half of the URI-Ms in these TimeMaps result in redirects when dereferenced.
Original Publication Citation
Kelly, M., Alkwai, L. M., Nelson, M. L., Weigle, M. C., & Van de Sompel, H. (2017). Impact of URI Canonicalization on Memento Count. Paper presented at the 2017 ACM/IEEE Joint Conference on Digital Libraries, Toronto, ON, Canada.
Repository Citation
Kelly, M., Alkwai, L. M., Nelson, M. L., Weigle, M. C., & Van de Sompel, H. (2017). Impact of URI Canonicalization on Memento Count. Paper presented at the 2017 ACM/IEEE Joint Conference on Digital Libraries, Toronto, ON, Canada.
ORCID
0000-0003-3749-8116 (Nelson), 0000-0002-2787-7166 (Weigle), 0000-0002-0715-6126 (Van de Sompel)
Comments
NOTE: This is the authors' pre-print version (arXiv) of a work that was published in 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL). The final version was published as:
Kelly, M., Alkwai, L. M., Nelson, M. L., Weigle, M. C., & Van de Sompel, H. (2017). Impact of URI Canonicalization on Memento Count. Paper presented at the 2017 ACM/IEEE Joint Conference on Digital Libraries, Toronto, ON, Canada.
Available at:
http://ieeexplore.ieee.org/document/7991601/