ORCID
0000-0003-4748-9176 (Jayanetti), 0000-0001-6498-7391 (Garg),0000-0002-8267-3326 (Alam), 0000-0003-3749-8116 (Nelson),0000-0002-2787-7166 (Weigle)
College
College of Sciences
Department
Computer Science
Graduate Level
Doctoral
Graduate Program/Concentration
Web Science and Digital Libraries
Publication Date
2023
DOI
10.25883/cr6a-sq47
Abstract
To identify robots and human users in web archives, we conducted a study using the access logs from the Internet Archive’s (IA) Wayback Machine in 2012 (IA2012), 2015 (IA2015), and 2019 (IA2019), and the Portuguese Web Archive (PT) in 2019 (PT2019). We identified user sessions in the access logs and classified them as human or robot based on their browsing behavior. In 2013, AlNoamany et al. [1] studied the user access patterns using IA access logs from 2012. They established four web archive user access patterns: single-page access (Dip), access to the same page at multiple archive times (Dive), access to distinct web archive pages at about the same archive time (Slide), and access to a list of archived pages (TimeMaps) for a certain URL (Skim). They also determined that in the 2012 IA access logs, humans were outnumbered by robots by 10:1 in terms of sessions and 5:4 in terms of raw HTTP accesses. We extended their work by presenting a comparison of detected robots vs. humans and their access patterns and temporal preferences based on the two archives (IA vs. PT) and between three years of IA access logs (IA2012, IA2015, IA2019). The total number of robots detected in IA2012 (91% of requests) and IA2015 (88% of requests) is greater than in IA2019 (70% of requests). Robots account for 98% of requests in PT2019. We found that the robots are almost entirely limited to Dip and Skim access patterns in IA2012 and IA2015, but exhibit all the patterns and their combinations in IA2019. We also investigated the temporal preferences of the users and discovered that both humans and robots favor web pages that have been archived recently.
[1] AlNoamany, Y., Weigle, M.C., Nelson, M.L.: Access patterns for robots and humans in web archives. In: JCDL ’13: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries. pp. 339–348 (2013), https://dl.acm.org/doi/10.1145/2467696.2467722
Keywords
Web archiving, User access patterns, Web server logs, Web usage mining, Web robot detection
Disciplines
Other Computer Sciences | Science and Technology Studies
Files
Download Full Text (1002 KB)
Recommended Citation
Jayanetti, Himarsha R.; Garg, Kritika; Alam, Sawood; Nelson, Michael L.; and Weigle, Michele C., "Robots Still Outnumber Humans in Web Archives in 2019, But Less Than in 2012" (2023). College of Sciences Posters. 10.
https://digitalcommons.odu.edu/gradposters2023_sciences/10