Comments on: How many working hyperlinks does the Web have now?
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now/
Comments on Ask MetaFilter post How many working hyperlinks does the Web have now?Sat, 30 May 2009 15:40:34 -0800Sat, 30 May 2009 15:40:34 -0800en-ushttp://blogs.law.harvard.edu/tech/rss60Question: How many working hyperlinks does the Web have now?
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now
How many working hyperlinks does the Web have now? <br /><br /> "Working hyperlink" defined as "will bring up something other than a 404 when clicked". It doesn't matter if that something is a URL-squatting advertiser or a never-to-be-followed "This is my first blog entry" blog entry from 1996.<br>
<br>
Back-of-napkin math welcome.post:ask.metafilter.com,2009:site.123451Sat, 30 May 2009 15:30:04 -0800Joe BeesehyperlinksworldwidewebBy: nitsuj
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1763880
Since Google <a href="http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html">says</a> the number of pages out there is infinite, and therefore can't come up with a solid number, I'd say the same goes for links, too.<br>
<blockquote><em>So how many unique pages does the web really contain? We don't know; we don't have time to look at them all! :-) Strictly speaking, the number of pages out there is infinite -- for example, web calendars may have a "next day" link, and we could follow that link forever, each time finding a "new" page. We're not doing that, obviously, since there would be little benefit to you. But this example shows that the size of the web really depends on your definition of what's a useful page, and there is no exact answer</em>.</blockquote>comment:ask.metafilter.com,2009:site.123451-1763880Sat, 30 May 2009 15:40:34 -0800nitsujBy: Tomorrowful
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1763888
What nitsuj quoted. Also, a nontrivial portion of the web at this point is dynamically generated, including hyperlinks, or consists of just one "page" with an effectively infinite amount of content.comment:ask.metafilter.com,2009:site.123451-1763888Sat, 30 May 2009 15:49:21 -0800TomorrowfulBy: zerokey
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1763896
Look. I'm creating a hyperlink RIGHT NOW! My vote is for infinite as well.comment:ask.metafilter.com,2009:site.123451-1763896Sat, 30 May 2009 16:02:12 -0800zerokeyBy: Joe Beese
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1763898
I was guessing the debate would be between trillions and quadrillions. But from what you're saying, at most it would be between "effectively infinite" and "mathematically infinite"?comment:ask.metafilter.com,2009:site.123451-1763898Sat, 30 May 2009 16:03:19 -0800Joe BeeseBy: Chocolate Pickle
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1763903
The problem is that the number is changing constantly, and there's no practical way to take an instantaneous snapshot of the entire web. (Especially since a nontrivial portion of it is locked up behind passwords.)<br>
<br>
Which is another way of saying that no one knows and no one will ever be able to find out. And even if they did, their knowledge would soon be out of date.comment:ask.metafilter.com,2009:site.123451-1763903Sat, 30 May 2009 16:08:51 -0800Chocolate PickleBy: Mick
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1763911
mathematically infinite on one of my websites alone, there's a calendar where you can always click to the next day.comment:ask.metafilter.com,2009:site.123451-1763911Sat, 30 May 2009 16:16:54 -0800MickBy: box
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1763933
Yeah, I think the boring answer is that it depends on your definitions of 'link' and 'page.'<br>
<br>
If you want a rough estimate, just pick a number and then add a bunch of zeroes to it.comment:ask.metafilter.com,2009:site.123451-1763933Sat, 30 May 2009 16:38:24 -0800boxBy: santaliqueur
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1763942
"mathematically infinite on one of my websites alone, there's a calendar where you can always click to the next day."<br>
<br>
So I could click up to the year 100,000,000,000 if I had the time?<br>
<br>
Whatever the calendar's limit is a large number of clicks for sure, but it's not even close to infinite.comment:ask.metafilter.com,2009:site.123451-1763942Sat, 30 May 2009 16:45:06 -0800santaliqueurBy: Joe Beese
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1763944
<a href="http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1763942">santaliqueur</a>: "<i>Whatever the calendar's limit is a large number of clicks for sure, but it's not even close to infinite.</i>"<br>
<br>
Perhaps I'm misunderstanding how dynamic links work. But isn't there an upper limit imposed by the amount of hosting space that exists in the world?<br>
<br>
That would seem to make it "effectively infinite" rather than "mathematically infinite".comment:ask.metafilter.com,2009:site.123451-1763944Sat, 30 May 2009 16:47:50 -0800Joe BeeseBy: box
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1763953
Imagine a page that, every time you click a link, the background changes to either black or white. Is that an infinity of pages, or two, or one? Man, this is a hard question to answer.comment:ask.metafilter.com,2009:site.123451-1763953Sat, 30 May 2009 17:02:15 -0800boxBy: axiom
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1764025
I'm not sure what you exactly mean by 'mathematically infinite' (erm, uncountably? countably?) but I think maybe you need to rephrase the question. Do you really want to know how many <strong>links</strong>? Or how many <strong>pages</strong>? And how do you handle dynamically generated pages? Surely, I could build a calendar app that would show you a different page for any given day into perpetuity; such an app could be said to have (countably) infinite pages. Estimating the number of pages actually stored on disks (versus pages generated for some input like the calendar app) is a whole other kettle of fish.<br>
<br>
I believe google claims to index billions of pages (without putting too fine a point on the distinction between types of pages). If you really don't need a very accurate guess, well, there you go. At least a couple of billion pages.comment:ask.metafilter.com,2009:site.123451-1764025Sat, 30 May 2009 17:43:39 -0800axiomBy: Joe Beese
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1764101
<a href="http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1764025">axiom</a>: "<i>I'm not sure what you exactly mean by 'mathematically infinite</i>"<br>
<br>
The sequence of positive integers is what I would call - however clumsily - "mathematically infinite". By definition, for any integer you could specify, someone else could specify that integer plus one. And all those integers already have the same "existence".<br>
<br>
On the other hand, while a calendar app could create a succeeding-day web page every time you clicked the correct link, without limit, those web pages wouldn't have the same existence as this web page [as bits stored on a server somewhere] until the links were actually clicked. And even if every atom in the universe could store a bit of information, eventually you would run out of storage space. So in that sense, I would describe the number of hyperlinks as "effectively infinite".<br>
<br>
Unless I'm misunderstanding the nature of dynamic links. (Or the sequence of positive integers.)comment:ask.metafilter.com,2009:site.123451-1764101Sat, 30 May 2009 18:19:45 -0800Joe BeeseBy: HuronBob
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1764110
An interesting question, and good answers. I have to point out, however, as long as we keep answering this, the number changes....comment:ask.metafilter.com,2009:site.123451-1764110Sat, 30 May 2009 18:38:28 -0800HuronBobBy: delmoi
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1764150
On the other hand, a calendar app with a 'next' link only exists after pressing the previous page. You could argue that while the page can be created at any time, it doesn't currently "exist" <br>
<br>
So while there are an infinite number of pages that can be created, there is not an infinite number in existence at any one point in time.<br>
<br>
The real question here is the amount of <i>information</i> on each page. A new calender page has zero information, because all the data on the page is based on the link. You just have one increasing day counter and that's it. You already know what the next 'page' will have on it, even though it has a different URL. <br>
<br>
The real question isn't "how many unique pages" but "how many pages with nonzero information values" (i.e. pages with actual 'stuff' on them).<br>
<br>
That number must be finite.comment:ask.metafilter.com,2009:site.123451-1764150Sat, 30 May 2009 19:14:47 -0800delmoiBy: Ookseer
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1764187
If we have 100 million domain names, and each has a day calendar calender a signed 32 bit integer to calculate unix time, it can display the days from Dec 13, 1901 to January 19, 2038, or 49,711 days. So we get 4 trillion, 971 billion, 100 million links.<br>
<br>
Which is, in essence a completely random number.<br>
<br>
How about we look at how many unique links are possible. That way we get an upper bound, at least.<br>
<br>
In Internet Explorer, the maximum length of a URL is 2,083 character. Since 7 of them are taken up by, at least, "http://" that leaves 2076. Valid URLS can be made from the characters A-Z, a-z, 0-9, and .:;@&?=%+, or 71 characters. Which give us something like 3.3 x 10^235 combinations for an upper bound. (Though in theory it's a little smaller.)<br>
<br>
Or another way to calculate it would be to sample a random assortment of pages of content, figure out what percentage of that content is links, (say 0.8% of an average Internet document) and multiply that by the estimated amount of data on the internet, say 500 terabytes. That gives 4 Tb. Figure out how long the average link is, divide through and you'll get an answer.comment:ask.metafilter.com,2009:site.123451-1764187Sat, 30 May 2009 19:34:50 -0800OokseerBy: Joe Beese
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1764207
<a href="http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1764187">Ookseer</a>: "<i>How about we look at how many unique links are possible. ... something like 3.3 x 10^235 combinations for an upper bound.</i>"<br>
<br>
If there is an upper limit of 3.3 x 10^235 possible URLs, would that make an upper limit of (3.3 x 10^235)! possible unique hyperlinks?comment:ask.metafilter.com,2009:site.123451-1764207Sat, 30 May 2009 19:43:26 -0800Joe BeeseBy: argybarg
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1764234
Alright, how about this: How many publicly accessible documents ending in ".htm" or ".html" are stored on a server somewhere?<br>
<br>
I realize that cuts away all the dynamically generated pages, but that's my point.comment:ask.metafilter.com,2009:site.123451-1764234Sat, 30 May 2009 20:11:05 -0800argybargBy: delmoi
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1764267
<i>I realize that cuts away all the dynamically generated pages, but that's my point.</i><br>
<br>
It would also cut out all of metafilter, almost all blogs, etc.comment:ask.metafilter.com,2009:site.123451-1764267Sat, 30 May 2009 21:15:45 -0800delmoiBy: delmoi
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1764271
I think what we would actually want to count would be 1) Html pages, and 2) records in databases that are used to fill in text for HTML pages plus 3) Other types of text storage (like huge ass XML files, non-relational databases, text files, emails, etc) The third group wouldn't be very big compared to the second one.comment:ask.metafilter.com,2009:site.123451-1764271Sat, 30 May 2009 21:17:44 -0800delmoiBy: Ookseer
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1764383
<i>If there is an upper limit of 3.3 x 10^235 possible URLs, would that make an upper limit of (3.3 x 10^235)! possible unique hyperlinks?</i><br>
<br>
More or less. Actually less. And more. And after thinking about it for a few hours, less.<br>
<br>
The less: Not all arrangements of characters are potentially valid urls, they need to start with a domain name or IP address, etc. But my math and regular expression ability is too weak to figure it out. Also some links will be identical. For example <a href="209.85.171.100">209.85.171.100</a> is identical to <a href="http://google.com">google.com</a>. <br>
<br>
The more: The 2,083 limit is just for Internet Explorer. Other browsers can accept more. However Apache, a common web server, accepts 8,192 character limits. So if we go with that, we get 7 x 10^277. links. Only 42 orders of magnitude more. <br>
<br>
And the less: It would take 8.5 x 10^261 Terabytes to store just those links. (Without compression, of course.) If there's an estimated 500 Exabytes of information on the net, and if it was all links of, say 500 characters, then there would be 10^18 (a million trillion) links as an upper bound. But few pages are all links, images, sounds and videos aren't links at all, and most links are shorter than 500 characters. So play around with those numbers until you find one you like.<br>
<br>
(Note I dropped a few orders of magnitude on my earlier post. Not 500Tb, 500<a href="http://www.guardian.co.uk/business/2009/may/18/digital-content-expansion">Eb</a>.)comment:ask.metafilter.com,2009:site.123451-1764383Sun, 31 May 2009 01:17:52 -0800OokseerBy: Ookseer
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1764384
(Looks like Metafilter eats IP addresses as URLS. But trust me, 209.85.171.100 is one of the many IP address that brings you Google search.)comment:ask.metafilter.com,2009:site.123451-1764384Sun, 31 May 2009 01:22:06 -0800OokseerBy: TheRaven
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1764431
Though not links, <a href="http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html">the Google blog</a> came up with a figure just under a year ago for number of URLs, while <a href="http://news.netcraft.com/archives/web_server_survey.html">Netcraft</a> does a monthly survey for number of sites, which seems too low.comment:ask.metafilter.com,2009:site.123451-1764431Sun, 31 May 2009 04:27:40 -0800TheRavenBy: insectosaurus
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1764625
<em>The sequence of positive integers is what I would call - however clumsily - "mathematically infinite". By definition, for any integer you could specify, someone else could specify that integer plus one. And all those integers already have the same "existence".</em><br>
<br>
Joe Beese, that's not how mathematicians talk about infinity. Basically - and someone please correct me if needed - there are two types of infinity, countable and uncountable. A countable infinite set can, basically, be listed (and on to infinity) - so, you can list positive integers, or all integers (0, 1, -1, 2, -2. . .), or all fractions (0, 1, 2, 1/2, 3, 1/3, 2/3 . . .) etc. All countable infinite sets are considered to be the same "size". The other "size" of infinity is uncountable. Sets like all real numbers, or all real positive numbers, are uncountably infinite, because they cannot be listed. All uncountably infinite sets are considered to be the same "size."<br>
<br>
So, there is a real difference between the idea that the internet has countably infinite pages, or uncountably infinite pages. Though both are infinite, they're very different sizes of infinity.comment:ask.metafilter.com,2009:site.123451-1764625Sun, 31 May 2009 08:50:07 -0800insectosaurusBy: Nelson
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1764656
Napkin math. That <a href="http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html">Google blog</a> says they know about 1 trillion URLs. But there's a lot of dynamic garbage. Three or four years ago people were excited about search engines crawling 1 billion pages. I'm gonna take a stab and say there's 100 billion interesting web pages today. That may be 10x too many. My other stab is that there are roughly 10 interesting links on the average web page. (Why? I made it up). So 10 links * 100 billion pages = 1 trillion "interesting" hyperlinks on "interesting" pages.comment:ask.metafilter.com,2009:site.123451-1764656Sun, 31 May 2009 09:24:03 -0800NelsonBy: Joe Beese
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1764663
<a href="http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1764625">insectosaurus</a>: "<i>Joe Beese, that's not how mathematicians talk about infinity.</i>"<br>
<br>
This does not surprise me. :-) Thanks for the clarification.comment:ask.metafilter.com,2009:site.123451-1764663Sun, 31 May 2009 09:33:04 -0800Joe BeeseBy: axiom
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1764838
<blockquote><a href="http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1764101">Joe Beese</a>: The sequence of positive integers is what I would call - however clumsily - "mathematically infinite". By definition, for any integer you could specify, someone else could specify that integer plus one. And all those integers already have the same "existence".</blockquote><br>
That's what is meant by countably infinite, as <strong>insectosaurus</strong> ably points out. Essentially, if you can take a set and define a 1-1 mapping between its members and the natural numbers (1,2,3,...) then it's countably infinite. Some sets are uncountably infinite, like the real numbers, and are in some sense "bigger infinities."<br>
<br>
<blockquote><a href="http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1764187">Ookseer</a>: If we have 100 million domain names, and each has a day calendar calender a signed 32 bit integer to calculate unix time, it can display the days from Dec 13, 1901 to January 19, 2038, or 49,711 days. So we get 4 trillion, 971 billion, 100 million links.<br>
<br>
Which is, in essence a completely random number.</blockquote><br>
Why use signed 32 bit integers? With a little extra effort we can allow arbitrary precision integers as input and get a much larger output range. I think that at least theoretically there are no limits on the size of the data posted to a website so we could (again, <em>theoretically</em>) get to countable infinity with a calendar app alone.comment:ask.metafilter.com,2009:site.123451-1764838Sun, 31 May 2009 12:31:51 -0800axiomBy: history is a weapon
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1764884
Alot.comment:ask.metafilter.com,2009:site.123451-1764884Sun, 31 May 2009 13:08:51 -0800history is a weaponBy: qxntpqbbbqxl
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1764923
Effectively infinite for all practical purposes, but not actually infinite. The web sits on top of a finite number of computers with finite capacity, which means there is a practical (but very large) limit to how many working hyperlinks can exist.comment:ask.metafilter.com,2009:site.123451-1764923Sun, 31 May 2009 13:52:19 -0800qxntpqbbbqxlBy: Ookseer
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1765265
<i>Why use signed 32 bit integers? With a little extra effort we can allow arbitrary precision integers as input and get a much larger output range.</i><br>
<br>
Because unix time is traditionally represented as a <a href="http://en.wikipedia.org/wiki/Unix_time#Representing_the_number">signed 32 bit integer</a>. And since this was obviously a dead-end for determining the number of links it didn't seem worth mentioning stuff like servers using 64 bit integers or using custom date calculations that don't rely on unix time, which would make an already arbitrary number even worse.<br>
<br>
I still think the best bet for estimating is:<br>
<code>Number of bytes of data on the 'net / percent of data that is links / average character length of links = number of links on the 'net.</code><br>
The tricky part is figuring out the second number since 20% of a web page might be links, but an image or video is 0%, and I haven't found a 'representative sample' of internet content to test.<br>
<br>
Or, if you just want to cheat, Google returns 5,670,000,000 results when searching for <a href="http://www.google.com/search?hl=en&q=http%3A%2F%2F&aq=f&oq=&aqi=g10">"http://"</a>comment:ask.metafilter.com,2009:site.123451-1765265Sun, 31 May 2009 19:48:17 -0800OokseerBy: ixohoxi
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1765791
There is, unfortunately, no meaningful answer to this question, for a host of reasons both technical (URLs aren't the easily quantifiable things you're imagining them to be) and practical (even if they <em>were</em>, how would you find the answer?).comment:ask.metafilter.com,2009:site.123451-1765791Mon, 01 Jun 2009 09:15:04 -0800ixohoxiBy: Night_owl
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1767932
...it's really depressing what comes up when you google <a href="http://www.google.com/#hl=en&q=%22www.%22&aq=f&oq=&aqi=g10&=Google+Search&=I%27m+Feeling+Lucky&fp=2Inaafc1UxE">"www."</a>. That search comes up with 48,840,000,000 (48 billion) results.comment:ask.metafilter.com,2009:site.123451-1767932Tue, 02 Jun 2009 16:16:16 -0800Night_owlBy: Nelson
http://ask.metafilter.com/123451/How-many-working-hyperlinks-does-the-Web-have-now#1768120
You can't rely on Google estimates to be particularly meaningful, particularly when they are a large number.comment:ask.metafilter.com,2009:site.123451-1768120Tue, 02 Jun 2009 19:31:13 -0800Nelson