Document Type

Article

Publication Date

2024

DOI

10.1007/s00799-024-00397-2

Publication Title

International Journal on Digital Libraries

Volume

25

Issue

3

Pages

537-553

Abstract

The significance of the web and the crucial role of web archives in its preservation highlight the necessity of understanding how users, both human and robot, access web archive content, and how best to satisfy this disparate needs of both types of users. To identify robots and humans in web archives and analyze their respective access patterns, we used the Internet Archive’s (IA) Wayback Machine access logs from 2012, 2015, and 2019, as well as Arquivo.pt’s (Portuguese Web Archive) access logs from 2019. We identified user sessions in the access logs and classified those sessions as human or robot based on their browsing behavior. To better understand how users navigate through the web archives, we evaluated these sessions to discover user access patterns. Based on the two archives and between the three years of IA access logs (2012 vs. 2015 vs. 2019), we present a comparison of detected robots vs. humans and their user access patterns and temporal preferences. The total number of robots detected in IA 2012 (91% of requests) and IA 2015 (88% of requests) is greater than in IA 2019 (70% of requests). Robots account for 98% of requests in Arquivo.pt (2019). We found that the robots are almost entirely limited to “Dip” and “Skim” access patterns in IA 2012 and 2015, but exhibit all the patterns and their combinations in IA 2019. Both humans and robots show a preference for web pages archived in the near past.

Rights

© The Authors 2024

This article is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Original Publication Citation

Jayanetti, H. R., Garg, K., Alam, S., Nelson, M. L., & Weigle, M. C. (2024). Robots still outnumber humans in web archives in 2019, but less than in 2015 and 2012. International Journal on Digital Libraries. 25(3), 537-553 . https://doi.org/10.1007/s00799-024-00397-2

ORCID

0000-0003-4748-9176 (Jayanetti), 0000-0001-6498-7391 (Garg), 0000-0002-8267-3326 (Alam), 0000-0003-3749-8116 (Nelson), 0000-0002-2787-7166 (Weigle)

Share

COinS