Date of Award

Summer 2014

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

Committee Director

Michael L. Nelson

Committee Director

Hussein L. Abdel-Wahab

Committee Member

B. Danette Allen

Committee Member

Chester E. Grosch

Committee Member

Yaohang Li

Committee Member

Michele C. Weigle

Abstract

We propose and develop a framework based on emergent behavior principles for the long-term preservation of digital data using the web infrastructure. We present the development of the framework called unsupervised small-world (USW) which is at the nexus of emergent behavior, graph theory, and digital preservation. The USW algorithm creates graph based structures on the Web used for preservation of web objects (WOs). Emergent behavior activities, based on Craig Reynolds’ “boids” concept, are used to preserve WOs without the need for a central archiving authority. Graph theory is extended by developing an algorithm that incrementally creates small-world graphs. Graph theory provides a foundation to discuss the vulnerability of graphs to different types of failures and attack profiles. Investigation into the robustness and resilience of USW graphs lead to the development of a metric to quantify the effect of damage inflicted on a graph. The metric remains valid whether the graph is connected or not. Different USW preservation policies are explored within a simulation environment where preservation copies have to be spread across hosts. Spreading the copies across hosts helps to ensure that copies will remain available even when there is a concerted effort to remove all copies of a USW component. A moderately aggressive preservation policy is the most effective at making the best use of host and network resources.

Our efforts are directed at answering the following research questions:

1. Can web objects (WOs) be constructed to outlive the people and institutions that created them?

We have developed, analyzed, tested through simulations, and developed a reference implementation of the unsupervised small-world (USW) algorithm that we believe will create a connected network of WOs based on the web infrastructure (WI) that will outlive the people and institutions that created the WOs. The USW graph will outlive its creators by being robust and continuing to operate when some of its WOs are lost, and it is resilient and will recover when some of its WOs are lost.

2. Can we leverage aspects of naturally occurring networks and group behavior for preservation?

We used Reynolds’ tenets for “boids” to guide our analysis and development of the USW algorithm. The USW algorithm allows a WO to “explore” a portion of the USW graph before making connections to members of the graph and before making preservation copies across the “discovered” graph. Analysis and simulation show that the USW graph has an average path length (L(G)) and clustering coefficient (C(G)) values comparable to small-world graphs. A high C(G) is important because it reflects how likely it is that a WO will be able spread copies to other domains, thereby increasing its likelihood of long term survival. A short L(G) is important because it means that a WO will not have to look too far to identify new candidate preservation domains, if needed. Small-world graphs occur in nature and are thus believed to be robust and resilient. The USW algorithms use these small-world graph characteristics to spread preservation copies across as many hosts as needed and possible.

USW graph creation, damage, repair and preservation has been developed and tested in a simulation and reference implementation.

ISBN

9781321301984

Share

COinS