Date of Award
Summer 2024
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Science
Program/Concentration
Computer Science
Committee Director
Michele C. Weigle
Committee Member
Michael L. Nelson
Committee Member
Sampath Jayarathna
Abstract
Webpages change over time, and web archives hold copies of historical versions of webpages. Users of web archives, such as journalists, want to find and view changes on webpages over time. However, the current search interfaces for web archives do not adequately support this task. For the web archives that include a full-text search feature, multiple versions of the same webpage that match the search query are shown individually without enumerating changes, or are grouped together in a way that hides changes. We present a change text search engine that allows users to find changes in webpages. We describe the implementation of the search engine backend and frontend, including a tool that allows users to view the changes between two webpage versions in context as an animation. We also propose changes to the Internet Archive’s Wayback Machine replay navigation banner to further support users viewing change over time. We evaluate the search engine with U.S. federal environmental webpages that changed between 2016 and 2020. The change text search results page can clearly show when terms and phrases were added or removed from webpages. The inverted index can also be queried to identify salient and frequently deleted terms in a corpus. We align the dataset to with a real-world click dataset, showing that users were searching for the same environmental terms that were ultimately deleted.
Rights
In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
DOI
10.25777/zaq9-sb74
ISBN
9798384444374
Recommended Citation
Frew, Lesley.
"Surfacing Text Changes in Archived Webpages"
(2024). Master of Science (MS), Thesis, Computer Science, Old Dominion University, DOI: 10.25777/zaq9-sb74
https://digitalcommons.odu.edu/computerscience_etds/181
ORCID
0000-0003-0929-049X