How long do Links Survive on the Internet?
June 19, 2016 1:51 PM   Subscribe

I linked to an article on LinkedIn in January of 2016. I went back to it yesterday, and it was gone. As I work on other projects, I notice that many links that don't seem very old (less than a year) return a "404" error message.

This seems especially true with news sites where you would expect a story to remain in its archive. Is there any data online about how long, on average, a link remains active before it disappears and why? And tell me quick!
posted by CollectiveMind to Computers & Internet (10 answers total) 3 users marked this as a favorite
 
The common name for this seems to be link rot. The section 'prevalence' goes to some estimates, both scholarly and ad hoc:
Nelson and Allen (2002) examined link rot in digital libraries and found that about 3% of the objects were no longer accessible after one year. In 2014, bookmarking site Pinboard's owner Maciej Cegłowski reported a “pretty steady rate” of 5% link rot per year
posted by the antecedent of that pronoun at 2:00 PM on June 19, 2016 [4 favorites]


First, if you need the content on the link, you'll likely be able to find it using the Internet Archive Wayback Machine at http://archive.org/web/.

Second, links disappear for all sorts of reasons. Sometimes, the architecture of a site (including the categories in which things are placed) can change. I wrote an article about organizing for parents of children's with Type 1 diabetes for a website of a division owned by a major corporate entity. In the four years since they published it, the name of the division and the way in which they publish content has changed so many times that I've had to change the link on my website to their website five separate times.

Sometimes, the goal of the site changes -- last week, I was going through my Safari reading list, clicked on a link to my colleague's site, and because her company's focus has narrowed, this article had been removed. (That's where the Wayback Machine comes in handy.) Sometimes, the company ceases to exist or the individual decides not to keep paying for the domain. Sometimes, an article's content is found to be flawed, or is too controversial and the owner removes it.

With news sites, articles that perform poorly are sometimes removed because the advertisers are not willing to pay to advertise on pages that yield a low number of hits vs. the reach and frequency they're seeking.

Perhaps someone else will be able to provide you with a statistical analysis of the avereage lifecycle of content. But I can tell you that your best way of saving content is to use something like Evernote, which saves both the original link and the content, and one of the best ways to find "lost" content is the Wayback Machine.
posted by The Wrong Kind of Cheese at 2:01 PM on June 19, 2016 [2 favorites]


I think that link rot might be getting worse. Pages these days use a lot of javascript, much of which doesn't function right across a large swath of browsers when written, and a year later might not run at all. Each page on commercial sites is now also typically linked to tons more javascript, to serve ads, do tracking, etc, and any of those can break, possibly breaking the page. As a result, it kinda seems like stuff gets archived sooner. I work in a group that does promotional pages that are heavily javascript oriented, and we turn off anything older than 1 year (100% bit rot year over year) and frankly after 6mo lots of it no longer works anyway.
posted by RustyBrooks at 2:03 PM on June 19, 2016 [4 favorites]


Well, by coincidence I have a Pinboard account that just recently came up for its first annual renewal. Originally I had the regular account but on renewal decided to spring for the extra $14 to go to the premium level. So Pinboard tried to save all my previously bookmarked pages and the results are: 1873 bookmarks, 64 not found, 10 gone, 58 server errors, 2 unavailable. This means that 134 out of 1873 bookmarks were not able to be captured, or 7.15%.

Granted, this is just one data point from one random Internet stranger. FWIW.

Oh, and on edit I just realized, many of those links were imported from Mozilla Sync or other places I had bookmarks, and so were probably older or much older than a year.
posted by forthright at 2:15 PM on June 19, 2016 [1 favorite]


According to Vanishing Act: The Erosion of Online Footnotes and the Implications for Scholarship in the Digital Age, by Hennepin, M. Wendy, the half-life of an Internet link is about 3 years.
posted by 1970s Antihero at 3:00 PM on June 19, 2016 [1 favorite]


In general, this is a direct effect of the ephemeral nature of much of the Web.

In the old days, Web pages tended to be static content, designed by hand, and there wasn't a lot of motive to be disruptive, especially since search engines didn't exist, so links and linkability as a property were *valuable*.

As the Web transitioned to being managed by dedicated "professionals" who justified their existence in terms of "web site redesign," and amateurs who farmed the "tough stuff" out to companies that ran large web systems on "content management systems," it has become more common to see massively disruptive changes introduced, often just as part of "changing the platform." In other cases, it'll be because a company like MySpace closed up shop. And sometimes it'll just be because someone didn't pay their bill, and a domain expired, forcing them to get a different one.

More recently, there's been a lot of work to transition to https:, and some sites such as Wired have undertaken this with a goal of migrating large portions of their archives. See https://www.wired.com/2016/04/wired-launching-https-security-upgrade/ etc. Wired is among the most technically savvy of news sites, and they're having trouble keeping all of their content accessible. What hope is there for lesser sites?

Meanwhile, some of us are busy carefully maintaining and curating content for decades. No fancy JavaScript, no hideously complex interdependencies. There's nothing that prevents it from being possible and doable, but most people probably don't consider it practical, because they really don't care too much about the content they created a decade ago.
posted by jgreco at 6:26 PM on June 19, 2016


Surely this link will exist for a long time.
posted by tayknight at 7:34 PM on June 19, 2016 [2 favorites]


Hah, that's an interesting bet.

I've labored to make every page I personally have put on the web still accessible. Many of them no longer have any links that point to them (from my sites at least) but if they're still in google or someone's bookmarks, they'll still come up. They've mostly got no JS, very little CSS, mostly text/html. That's just a non-starter for most people these days though.
posted by RustyBrooks at 7:55 PM on June 19, 2016


It's definitely a major long-term problem. According to one study by Kendra Albert, Larry Lessig, and Jonathan Zittrain:
...more than 70% of the URLs within the Harvard Law Review and other journals [from 1999 to 2012], and 50% of the URLs found within United States Supreme Court opinions, do not link to the originally cited information.
This NYT piece goes into more detail.
posted by Rhaomi at 7:58 PM on June 19, 2016


5-10% link-rot per year is what I saw when I was actively maintaining my links website. The lower rate was for personal sites, and the higher rate for news and corporate sites, especially those that used some sort of content management system (CMS). They would periodically switch CMSes, and everything would disappear. Or like the local newspaper, anything more than a couple weeks old would disappear behind a paywall, never to be seen again.
posted by DaveP at 3:48 AM on June 20, 2016


« Older Chic go-bag for cables, aspirin, thumb drive, etc.   |   Looking for Lovecraftian Fiction that Engages with... Newer »
This thread is closed to new comments.