<img src="https://trc.taboola.com/1321591/log/3/unip?en=page_view" width="0" height="0" style="display:none">

Fact Check with Logically.

Download the Free App Today

How To Stop The Internet From Disappearing

How To Stop The Internet From Disappearing

Imagine a phenomenon that is so common almost every website has a specific visual conceit to deal with it, but is so unexamined that you’ve probably never heard of it. Spotify handles it with a pun on Kanye’s 808s and Heartbreak, Marie Kondo’s website with a characteristically serene but slightly sickly line about tidying and gratitude, and the National Trust uses it as an opportunity to show you some nice deer. All of these pages are the consequences of linkrot.

Linkrot is when a hyperlink on one page sends you somewhere that no longer exists. You hit a 404 page. It’s a phenomenon you’ve probably encountered, but one that only becomes properly visible when it's experienced as an interruption to your ability to find information.

“Linkrot tends to be a problem that folks aren’t really aware of until they experience it at an inopportune time themselves,” Clare Stanton, a researcher at Harvard’s Library Innovation Lab told me. Add to that how important the internet is to our daily lives, and you’ve got a serious problem. Instead of hitting a 404 when looking for a particular song or blogpost about bamboo storage boxes, what if you hit one when looking for essential information on a government website, or to treat a medical emergency? That’s more than an inconvenience, it’s an information hazard.

As the last few years of digital media have taught us, content is king - until it isn’t.

Content drift is a close relation of linkrot. However, rather than a hyperlink giving you a 404, it instead takes you to a live but unexpected page. Content drift is more difficult to spot than linkrot, but its impact is arguably more substantial: at least when you hit a 404 page you know what’s happened – content drift results in a kind of destabilising of information that can go largely unnoticed by many users. Its effects can even be seen in the evolution of academic style guides; the MLA, for example, instructs writers and researchers to include the date a web source was accessed in their notes. 

Why do linkrot and content drift happen?

As the last few years of digital media have taught us, content is king - until it isn’t. Once the owner of a given outlet decides it’s no longer commercially viable, it can disappear. Any links that previously went to these pages will therefore simply die. 

Yahoo famously killed off GeoCities, and with it a large chunk of many people’s early memories of the internet. Online publication Gothamist and a number of its sister sites were pulled down in 2017 without warning because the sites were no longer considered affordable by their owners.

But the strained commercial imperatives of digital media aren’t the only causes of linkrot and content drift. Web pages are often built to be dynamic – always provisional and subject to change. Stanton offers a good example: the U.S. government websites that have what she calls “built-in turnover.”

“Whitehouse.gov is always going to be the URL for the current presidential administration’s website,” she explains. “If an author references something on President Trump’s whitehouse.gov, that link will only send a reader to President Trump’s homepage until Joe Biden’s team [takes] over whitehouse.gov.”

But there’s another important reason the internet can’t ever be perfectly maintained without the occasional crack: managing and preserving information requires a lot of labor.

Jonathan Zittrain, a professor of law at Harvard, has written that the design of internet-based systems “naturally create[s] gaps of responsibility for maintaining valuable content that others rely on.” We could include content moderation and open-source maintenance as evidence of this; they’re attempts to enlist users to help plug the gaps.

Given the imperatives behind all forms of content, some kind of stewardship - librarianship, even - would surely be valuable; it’s just not in anyone’s interest to assume responsibility.

Invoking the role of libraries in intellectual production throughout human history, Zittrain reminds us that “these buildings didn’t run themselves, and they weren’t mere warehouses. They were staffed with clergy and then librarians [...] who fostered a culture of preservation and its many elaborate practices, so precious documents would be both safeguarded and made accessible at scale.”

This type of labor - which has been overlooked and undermined by governments all around the world over the last few decades - is ultimately about preserving and organizing information. It not only makes citations possible, but also accurate and reliable. Without it, our information ecosystem will become increasingly unstable.

Research on linkrot and content drift has, so far, generally focused on specific groups of texts and resources. Back in 2014, Zittrain and fellow researchers Kendra Albert and Lawrence Lessig looked at a number of Harvard law journals and links in published links contained across published U.S. Supreme Court opinions.

Stanton (who joined Harvard in 2017) tells me that the research “found that around half of all URLs published in Supreme Court opinions were not pointing to the intended content” while nearly 70 percent of the links in HLS’ journals had rotted away.

Earlier this year, a research team - which included Stanton and Zittrain - conducted a similar study. This time, they looked at hyperlinks in New York Times articles from 1996 (the year the newspaper’s website launched) to 2016 to 2019.

Here were the findings, as the team explained in the Columbia Journalism Review in May this year

“We found that of the 553,693 articles within the purview of our study––meaning they included URLs on nytimes.com––there were a total of 2,283,445 hyperlinks pointing to content outside of nytimes.com. Seventy-two percent of those were “deep links” with a path to a specific page, such as example.com/article, which is where we focused our analysis (as opposed to simply example.com, which composed the rest of the data set).

“Of these deep links, 25 percent of all links were completely inaccessible. Linkrot became more common over time: 6 percent of links from 2018 had rotted, as compared to 43 percent of links from 2008 and 72 percent of links from 1998. Fifty-three percent of all articles that contained deep links had at least one rotted link.”

The researchers also manually reviewed 4,500 links and found that “thirteen percent of intact links [from the sample] had drifted significantly since the Times published them.”

If one of the world’s most prestigious publications is struggling to deal with the slipperiness of the modern web, it doesn’t bode well for everyone else.

Clearly, the problem is extensive. The researchers also found – perhaps not unsurprisingly – that linkrot and content drift grow over time.  If one of the world’s most prestigious publications is struggling to deal with the slipperiness of the modern web, it doesn’t bode well for everyone else.

Given ephemerality is now part and parcel of human experience, surely linkrot and content drift should be seen as features of our reality rather than a bug? Unfortunately, the impact of both are far from trivial and can pose serious problems. Misinformation is maybe the most obvious issue: as Zittrain, Albert, and Lessig noted back in 2014, “tracking down every original copy of an edition of a printed New York Times and changing a story on page A4 is the stuff of Orwell’s imagination, not real-world practicality. But to do the same thing with an online edition is trivial.” 

We saw this come to fruition in the U.K. in one of the many minor scandals of 2020: Dominic Cummings edited a blog post he wrote in 2019 to make it appear that he had predicted the coronavirus pandemic – something that also demonstrates how the ephemerality of the web can be intentionally abused by those in positions of power.

Stanton agrees that linkrot and content drift play a part in misinformation. “Linkrot and content drift [make] it even harder for people to track information and to hold people and organizations accountable. There is no trace of web content that has been changed unless someone has archived it.”

But misinformation aside, dead links don’t only undermine trust, they also foreclose future opportunities for a citation, and reduce the scope of our collective memory. If, for example, we continue to lose local news websites, the horizon of citation radically shrinks. We lose the ability to enrich new stories with the perspectives and histories they contain.

On a more global level, meanwhile, the digital miscellany of significant historical events - like a pandemic - will also be lost. “We have a global event going on where nobody knows how it’s going to end,” librarian Gary Price said in an interview with the Internet Archive. “Most of it is going to play out on the internet. If we don’t archive it now, the record for the future is not going to be as complete as it could have been.”

Treating the rot

While the prognosis isn’t great, there are still a number of tools and organizations that are fighting the good fight when it comes to our fragmenting internet.

Geocities was archived by the OoCities project. “Our aim is to save those pages which are worthy and unique scientific sources or are of great public interest as well as those, which are historically interesting or just representing the 90's website culture and style.” 

A couple of developers responded quickly to the loss of Gothamist by creating a tool to help journalists retrieve pages from Google’s AMP database (these are the pages you are served when reading on Google on a mobile). Ironically, the link to this tool no longer works, making it a great example of content drift. Journalist Ben Welsh developed a similar tool in response to the shutdown called savemy.news that’s fortunately still available.

The Harvard research by Clare Stanton and others led to the creation of Perma.cc, a tool that allows users to create a permanent copy of a given web page. Initially, Stanton tells me, the tool “was built to serve the needs of folks writing for the long term in the context of the U.S. court system and legal education system. Now we have expanded that user base but the mission is the same: when publications need to stick around for the long term, the citations need to also stick around for the long term.” [Emphasis mine.]

It’s also impossible to talk about linkrot and digital archiving without talking about the work done by the Internet Archive, which even developed a “holiday” to spotlight the issue: 404 day, celebrated on April 4.

Probably better known for its Wayback Machine – the go-to tool for anyone that wants to view a web page that has either changed or no longer exists. The Wayback Machine works a little bit like a search engine; it crawls the web and takes snapshots of web pages at specific points in time. Though it doesn't capture everything, it can still be used as a tool for users to manually take a snapshot of a specific page.  

However, technology alone won’t solve the problem of linkrot and content drift.“The answer is not as simple as ‘archive everything and use a PermaLink in place of every URL,’” Stanton says. 

“An interesting dynamic when we think about preserving digital versus physical content is that physical content could usually survive, even in less than ideal conditions, for a few decades,” she explains. However, “digital content often won't last a decade if somebody isn't purposefully making sure it does.” This means “the responsibility for preservation becomes more diffuse, and is shifted to the people who are interacting with that material closer to its creation. This could mean an author, a web administrator, an early reader, or an organization that cares about its history.”

In short, this means there needs to be greater collaboration with the people that produce and manage sources of information. “There is work to be done in collaboration between information specialists and journalists to build frameworks for archiving web citations,” she says. “There should be workflows in place that balance the needs of digital newsrooms and more traditional preservation practices. 

Preserving the past by thinking about the future

Linkrot presents a varied and complex set of challenges. Ultimately, mitigating the problems it causes is going to require a serious shift in mindset.

But what does that mean for all of us as everyday internet users? According to Stanton, there’s not a lot we can do. “Link rot and content drift are the result of the natural ebbs and flows of the internet. It was built to be a distributed network of users, creators, and managers.”

This isn’t to say we shouldn’t take responsibility. “The place that folks do have the ability to combat linkrot and content drift is as authors, lawyers, scholars, making sure that when they’re producing new content relying on citations that they find a way to preserve what’s important,” she says.

“Shifting your mindset a little more towards the future can make a huge difference. It’s like flossing your teeth - it doesn’t really have the same impact if you’re just flossing the day before you go to the dentist for a cleaning! Consistently working an archival mindset into your work makes for a much healthier historical record.”

Richard Gall is a writer interested in the intersection of technology, society, and politics. You can read more of his writing on his blog The Cookie and follow him on Twitter @richggall. He is also the co-host of What We Talk About When We Talk About Tech, a podcast about tech storytelling.

Related Articles

The Electability Paradox

Joe Biden's surge on Super Tuesday can be put down to two key things, both of which followed directly from his remarkable win in South Carolina a few days prior. First, his victory gave the democratic establishment a reason (or an excuse) to finally...

How a single tweet can ruin lives

Tweeting is harmless, right? You just squash your feelings into 280 characters and leave it for the world to read or ignore. You comment on a news story or give your followers some #MondayMotivation. You can even have access to the most influential...