Date of Award

Summer 2012

Document Type


Degree Name

Master of Science (MS)


Computer Science

Committee Director

Michele C. Weigle

Committee Member

Michael L. Nelson

Committee Member

Ravi Mukkamala


Archive-It, a subscription service from the Internet Archive, allows users to create,maintain, and view digital collections of web resources. The current interface of Archive-It is largely text-based, supporting drill-down navigation using lists of URIs.While this interface provides good searching capabilities, it is not efficient for browsing. In the absence of keywords, a user has to spend large amount of time trying to locate a web page of interest. In order to provide a better visual experience to the user, we have studied the underlying characteristics of Archive-It collections and implemented six different visualizations (treemap, time cloud, bubble chart, image plot, timeline and wordle), each highlighting one or more of the underlying characteristics of the collection. Archive-It supports grouping of web pages into categories, however, it does not enforce its usage. As a result there are many collections with missing or improper grouping. For such collections, we present a method of grouping web pages based on a set of pre-defined rules.


Included in this record is the author's Thesis Defense Presentation in pdf. Also available in Powerpoint on Slideshare:



padia-thesis2-120722135633-phpapp01.pdf (5744 kB)
Thesis Defense Presentation, August 2012