Date of Award

Summer 2012

Document Type


Degree Name

Master of Science (MS)


Computer Science

Committee Director

Michele C. Weigle

Committee Member

Michael L. Nelson

Committee Member

Ravi Mukkamala


Archive-It, a subscription service from the Internet Archive, allows users to create,maintain, and view digital collections of web resources. The current interface of Archive-It is largely text-based, supporting drill-down navigation using lists of URIs.While this interface provides good searching capabilities, it is not efficient for browsing. In the absence of keywords, a user has to spend large amount of time trying to locate a web page of interest. In order to provide a better visual experience to the user, we have studied the underlying characteristics of Archive-It collections and implemented six different visualizations (treemap, time cloud, bubble chart, image plot, timeline and wordle), each highlighting one or more of the underlying characteristics of the collection. Archive-It supports grouping of web pages into categories, however, it does not enforce its usage. As a result there are many collections with missing or improper grouping. For such collections, we present a method of grouping web pages based on a set of pre-defined rules.


Included in this record is the author's Thesis Defense Presentation in pdf. Also available in Powerpoint on Slideshare:


In Copyright. URI: This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).



padia-thesis2-120722135633-phpapp01.pdf (5744 kB)
Thesis Defense Presentation, August 2012