Web Redirection Patterns Unraveled: From Lost Links to Unexpected Detours

Author ORCiD

0000-0001-6498-7391

College

College of Sciences

Department

Computer Science

Graduate Level

Doctoral

Graduate Program/Concentration

Web Science and Digital Libraries

Presentation Type

No Preference

Abstract

URI redirections are essential for managing website structure, improving search engine rankings, and enhancing security. However, they also introduce challenges that affect user experience, web performance, and long-term content accessibility. This study analyzes 11 million unique redirecting URIs, following each redirection path for up to 10 steps to uncover common patterns and their implications. We found that while half of all redirections successfully led to their intended destination, the other half resulted in errors, including a small fraction (0.06%) that exceeded the 10-hop limit, causing inefficiencies. Standard redirections, such as HTTP-to-HTTPS upgrades, generally followed best practices, but many involved domain or path changes, reflecting website migrations, rebranding efforts, and security risks. A surprising discovery was the presence of "sink" URIs—endpoints where multiple redirects converge—used for traffic consolidation by major websites or even playful misdirections like "Rickrolling." Additionally, we identified 62,000 custom 404 error pages, many of which were "soft 404s," where missing content was incorrectly treated as valid, leading to wasted resources. These findings highlight the significant role of URI redirections in shaping the web and reveal critical challenges such as outdated links, server instability, and improper error handling. By examining large-scale redirection data, this research provides valuable insights to web developers, digital archivists, and researchers, helping them improve website efficiency, optimize resources, and ensure the long-term accessibility of online content.

Keywords

Web Science, Web, Redirection, URI

This document is currently not available here.

Share

COinS
 

Web Redirection Patterns Unraveled: From Lost Links to Unexpected Detours

URI redirections are essential for managing website structure, improving search engine rankings, and enhancing security. However, they also introduce challenges that affect user experience, web performance, and long-term content accessibility. This study analyzes 11 million unique redirecting URIs, following each redirection path for up to 10 steps to uncover common patterns and their implications. We found that while half of all redirections successfully led to their intended destination, the other half resulted in errors, including a small fraction (0.06%) that exceeded the 10-hop limit, causing inefficiencies. Standard redirections, such as HTTP-to-HTTPS upgrades, generally followed best practices, but many involved domain or path changes, reflecting website migrations, rebranding efforts, and security risks. A surprising discovery was the presence of "sink" URIs—endpoints where multiple redirects converge—used for traffic consolidation by major websites or even playful misdirections like "Rickrolling." Additionally, we identified 62,000 custom 404 error pages, many of which were "soft 404s," where missing content was incorrectly treated as valid, leading to wasted resources. These findings highlight the significant role of URI redirections in shaping the web and reveal critical challenges such as outdated links, server instability, and improper error handling. By examining large-scale redirection data, this research provides valuable insights to web developers, digital archivists, and researchers, helping them improve website efficiency, optimize resources, and ensure the long-term accessibility of online content.