<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel> 

      <title>Comments on: "He's making it up as he goes along!"</title>
      <link>http://ask.metafilter.com/23542/Hes-making-it-up-as-he-goes-along/</link>
      <description>Comments on Ask MetaFilter post "He's making it up as he goes along!"</description>
	  	  <pubDate>Fri, 02 Sep 2005 20:33:53 -0800</pubDate>
      <lastBuildDate>Fri, 02 Sep 2005 20:33:53 -0800</lastBuildDate>
      <language>en-us</language>
	  <docs>http://blogs.law.harvard.edu/tech/rss</docs>
	  <ttl>60</ttl>

<item>
  	<title>Question: &quot;He&apos;s making it up as he goes along!&quot;</title>
  	<link>http://ask.metafilter.com/23542/Hes-making-it-up-as-he-goes-along</link>	
  	<description>Why is google spidering specific but non-existent pages on my blog? Over the last couple of days I&apos;ve seen google&apos;s bot scanning my website,  trying to access specific URLS:&lt;br&gt;
&lt;br&gt;
&lt;code&gt;www.benzo8.org crawl-66-249-66-3.googlebot.com - - [03/Sep/2005:03:03:32 +0100] &quot;GET /summit/contact.html HTTP/1.1&quot; 200 38112 &quot;-&lt;br&gt;
&quot; &quot;Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)&quot;&lt;br&gt;
www.benzo8.org crawl-66-249-66-3.googlebot.com - - [03/Sep/2005:03:03:46 +0100] &quot;GET /pages/devonshire.html HTTP/1.1&quot; 200 38114&lt;br&gt;
&quot;-&quot; &quot;Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)&quot;&lt;br&gt;
www.benzo8.org crawl-66-249-66-3.googlebot.com - - [03/Sep/2005:03:03:57 +0100] &quot;GET /pages/hampton.html HTTP/1.1&quot; 200 38111 &quot;-&quot;&lt;br&gt;
 &quot;Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)&quot;&lt;br&gt;
www.benzo8.org crawl-66-249-66-3.googlebot.com - - [03/Sep/2005:03:04:09 +0100] &quot;GET /chatsford/contact.html HTTP/1.1&quot; 200 38115&lt;br&gt;
 &quot;-&quot; &quot;Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)&quot;&lt;br&gt;
&lt;/code&gt;&lt;br&gt;
Unfortunately, those URLs don&apos;t exist and they&apos;ve never existed. So why is google requesting them? Is it just guessing, or what? Also, my site is set up so that requests which don&apos;t got to an existing page will go to the index page, so, if google requests these non-existent pages and gets the same content each time (ie: my index page) will it think (incorrectly) that I&apos;ve set up some SEO linkfarm and lower my page-rank as a punishment?</description>
  	<guid isPermaLink="false">post:ask.metafilter.com,2008:site.23542</guid>
  	<pubDate>Fri, 02 Sep 2005 20:05:52 -0800</pubDate>
  	<dc:creator>benzo8</dc:creator>
	
	<category>google</category>
	
	<category>spidering</category>
	
	<category>blog</category>
	
	<category>non-existent</category>
	
</item>
<item>
  	<title>By: RustyBrooks</title>
  	<link>http://ask.metafilter.com/23542/Hes-making-it-up-as-he-goes-along#374777</link>	
  	<description>I can only imagine that someone, somewhere, has those links on their site.  It may be a mistake, or it may be something that I don&apos;t quite understand.  It&apos;s about the only thing I can think of.</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.23542-374777</guid>
  	<pubDate>Fri, 02 Sep 2005 20:33:53 -0800</pubDate>
  	<dc:creator>RustyBrooks</dc:creator>
</item>
<item>
  	<title>By: mosch</title>
  	<link>http://ask.metafilter.com/23542/Hes-making-it-up-as-he-goes-along#374791</link>	
  	<description>&lt;a href=&quot;http://www.google.com/search?q=link%3Awww.benzo8.com%2Fpages%2Fdevonshire.html&amp;sourceid=mozilla-search&amp;start=0&amp;start=0&amp;ie=utf-8&amp;oe=utf-8&amp;client=firefox-a&amp;rls=org.mozilla:en-US:official&quot;&gt;weird&lt;/a&gt;.</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.23542-374791</guid>
  	<pubDate>Fri, 02 Sep 2005 21:28:04 -0800</pubDate>
  	<dc:creator>mosch</dc:creator>
</item>
<item>
  	<title>By: clord</title>
  	<link>http://ask.metafilter.com/23542/Hes-making-it-up-as-he-goes-along#374843</link>	
  	<description>they might be using it as a test to see if you are in fact a link farm? &lt;br&gt;
&lt;br&gt;
sorta like: &amp;quot;If I throw these random links at you, and you provide stuff to me, I suspect you of trying to pollute your legitimate links too&amp;quot;</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.23542-374843</guid>
  	<pubDate>Sat, 03 Sep 2005 00:27:41 -0800</pubDate>
  	<dc:creator>clord</dc:creator>
</item>
<item>
  	<title>By: cillit bang</title>
  	<link>http://ask.metafilter.com/23542/Hes-making-it-up-as-he-goes-along#374861</link>	
  	<description>I notice you&apos;re returning HTTP 200 codes. You should return 404 errors or 301 redirect codes if you want Google to realize those pages don&apos;t exist.</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.23542-374861</guid>
  	<pubDate>Sat, 03 Sep 2005 03:32:04 -0800</pubDate>
  	<dc:creator>cillit bang</dc:creator>
</item>
<item>
  	<title>By: benzo8</title>
  	<link>http://ask.metafilter.com/23542/Hes-making-it-up-as-he-goes-along#374869</link>	
  	<description>Well, basically every URL that arrives at the site goes through a rewriting process, and those that don&apos;t land on a certain point in the database (ie: an extant page) will return the frontpage content, so a 200 is correct in terms of outcome, but in this particularly situation, and given what I feared and &lt;a href=http://ask.metafilter.com/mefi/23542#374843&gt;clord&lt;/a&gt; suggested too - I could be damaging my google page rank by doing that, if they&apos;re doing what we think they might be doing.&lt;br&gt;
&lt;br&gt;
Don&apos;t we imagine that google would be intelligent enough to check if content they were returned when they made a &amp;quot;random&amp;quot; check was actually relevant - ie: if they did trip over a link farm, they normally build the search terms/urls into the content of the dynamic page to increase their googleness. My site will just return the same index/content page time and time again, but it will (unless entirely coincidentally) bear no relevance to the URL google&apos;s decided to test me with...&lt;br&gt;
&lt;br&gt;
So, in short - is the consensus that google are checking for me being a link farm, and I&apos;m currently doing nothing to disuade them of that notion?</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.23542-374869</guid>
  	<pubDate>Sat, 03 Sep 2005 04:43:23 -0800</pubDate>
  	<dc:creator>benzo8</dc:creator>
</item>
<item>
  	<title>By: cillit bang</title>
  	<link>http://ask.metafilter.com/23542/Hes-making-it-up-as-he-goes-along#374870</link>	
  	<description>Every page should have one single canonical URI and everything else should be a redirect to it. Not just to help out Google, it&apos;s also good design.</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.23542-374870</guid>
  	<pubDate>Sat, 03 Sep 2005 04:46:41 -0800</pubDate>
  	<dc:creator>cillit bang</dc:creator>
</item>
<item>
  	<title>By: malevolent</title>
  	<link>http://ask.metafilter.com/23542/Hes-making-it-up-as-he-goes-along#374896</link>	
  	<description>&lt;i&gt;&amp;quot;those that don&apos;t land on a certain point in the database (ie: an extant page) will return the frontpage content&amp;quot;&lt;/i&gt;&lt;br&gt;
&lt;br&gt;
That&apos;s a really bad idea, as you&apos;re finding out. Get a helpful error page in there and use the right status code.&lt;br&gt;
&lt;br&gt;
Is your site set up to respond only to the correct host header, or will it treat any request to the correct IP as being for that site? If it&apos;s the latter then the requests could be either for someone who previously had that IP, or for a domain name that&apos;s been incorrectly pointed your way.</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.23542-374896</guid>
  	<pubDate>Sat, 03 Sep 2005 07:09:29 -0800</pubDate>
  	<dc:creator>malevolent</dc:creator>
</item>
<item>
  	<title>By: benzo8</title>
  	<link>http://ask.metafilter.com/23542/Hes-making-it-up-as-he-goes-along#374899</link>	
  	<description>The site is set up for virtual hosts, so will only respond to www.benzo8.org requests. In terms of the URL handling - I&apos;m running Mambo, and that pretty much does its own thing. I&apos;ve got the SEO-friendly URL modules switched on and I guess that&apos;s what handles the rewriting, but I think even without it, it defaults to the index page, so I&apos;m gonna have to get my hands dirty with the php I guess and find out how to get it to fail gracefully rather than lazily... Thanks to all.</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.23542-374899</guid>
  	<pubDate>Sat, 03 Sep 2005 07:13:44 -0800</pubDate>
  	<dc:creator>benzo8</dc:creator>
</item>

    </channel>
</rss>
