Date of Award

Spring 2015

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science

Committee Director

Michael L. Nelson

Committee Member

Michele C. Weigle

Committee Member

Irwin Levinstein

Abstract

A variety of fan-based wikis about episodic fiction (e.g., television shows, novels, movies) exist on the World Wide Web. These wikis provide a wealth of information about complex stories, but if readers are behind in their viewing they run the risk of encountering spoilers" -- information that gives away key plot points before the intended time of the show's writers. Enterprising readers might browse the wiki in a web archive so as to view the page prior to a specific episode date and thereby avoid spoilers. Unfortunately, due to how web archives choose the "best" page, it is still possible to see spoilers (especially in sparse archives).

In this paper we discuss how to use Memento to avoid spoilers. Memento uses TimeGates to determine which best archived page to give back to the user, currently using a minimum distance heuristic. We quantify how this heuristic is inadequate for avoiding spoilers, analyzing data collected from fan wikis and the Internet Archive. We create an algorithm for calculating the probability of encountering a spoiler in a given wiki article. We conduct an experiment with 16 wiki sites for popular television shows. We find that 38% of those pages are unavailable in the Internet Archive. We find that when accessing fan wiki pages in the Internet Archive there is as much as a 66% chance of encountering a spoiler. Using sample access logs from the Internet Archive, we find that 19% of actual requests to the Wayback Machine for wikia.com pages ended in spoilers. We suggest the use of a different minimum distance heuristic, minpast, for wikis, using the desired datetime as an upper bound.

Finally, we highlight the use of an extension for MediaWiki that utilizes this new heuristic and can be used to avoid spoilers. An unexpected revelation about Memento comes from the development of this extension. It turns out that an optimized two request-response Memento pattern for interacting with TimeGates does not perform well with MediaWiki, leading us to fall back to the original Memento pattern of three request-response pairs. We also conduct performance testing on the extension and show that it has a minimal impact on MediaWiki's performance.

Comments

This record includes a pdf of the author's Thesis Defense Presentation. The powerpoint presentation is available on Slideshare: http://www.slideshare.net/shawnmjones/avoiding-spoilers-on-mediawiki-fan-sites-using-memento

jones-thesisdefense-2015-03.pdf (11797 kB)
Thesis Defense Presentation, March 20, 2015

Share

COinS