<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel>
	  <title>Ask MetaFilter questions tagged with robots.txt</title>
      <link>http://ask.metafilter.com/tags/robots.txt</link>
      <description>Questions tagged with 'robots.txt' at Ask MetaFilter.</description>
	  <pubDate>Mon, 27 Aug 2007 11:53:47 -0800</pubDate> <lastBuildDate>Mon, 27 Aug 2007 11:53:47 -0800</lastBuildDate>

      <language>en-us</language>
	  <docs>http://blogs.law.harvard.edu/tech/rss</docs>
	  <ttl>60</ttl>	  
	<item>
	<title>How to make Google and Microsoft like me again?</title>
	<link>http://ask.metafilter.com/70221/How%2Dto%2Dmake%2DGoogle%2Dand%2DMicrosoft%2Dlike%2Dme%2Dagain</link>	
	<description>Why have Google and Microsoft shunned my web site? And why does Google&apos;s sitemap control panel think my robots.txt file is unreachable? A few days ago, I discovered that my website has mostly disappeared from Google, and I am utterly confused as to why. In the past, if you searched for JD Harper, you&apos;d get my personal blog at &lt;a href=&quot;http://www.jdharper.com/wordpress&quot;&gt;http://www.jdharper.com/wordpress&lt;/a&gt; as the first result. It seems that that page, as well as most of my best posts, have disappeared from the Google index. (I&apos;ve tried &lt;a href=&quot;http://www.google.com/search?q=site:jdharper.com&amp;hl=en&amp;start=0&amp;sa=N&quot;&gt;a site search&lt;/a&gt; to see what&apos;s missing). &lt;br&gt;
&lt;br&gt;
Microsoft is also excluding my web site from the search results at live.com, but Yahoo! still has my site at the top. (For now.)&lt;br&gt;
&lt;br&gt;
When I logged in to the Google Webmaster Tools, it said that my page was included in the index, but that my sitemap contained errors. The error message that it gives me is as follows:&lt;br&gt;
&lt;br&gt;
&lt;blockquote&gt;Network unreachable: robots.txt unreachable&lt;br&gt;
We encountered an error while trying to access your Sitemap. Please ensure your Sitemap follows our guidelines and can be accessed at the location you provided and then resubmit.&lt;/blockquote&gt;&lt;br&gt;
&lt;br&gt;
Under Diagnostics, under &quot;Crawl Errors,&quot; it lists 133 &quot;Unreachable URLs,&quot; all of which say &quot;robots.txt unreachable,&quot; with a link to the previous error message.&lt;br&gt;
&lt;br&gt;
My robots.txt file is located at &lt;a href=&quot;http://www.jdharper.com/robots.txt&quot;&gt;http://www.jdharper.com/robots.txt&lt;/a&gt;, and it consists entirely of a pointer to my &lt;a href=&quot;http://www.jdharper.com/sitemap.xml.gz&quot;&gt;sitemap&lt;/a&gt;. I thought that perhaps the crawler had just tried to access robot.txt at a time when the server was down, but it&apos;s been several days and it says that it encountered the same error earlier today.&lt;br&gt;
&lt;br&gt;
So it looks to me like Google is excluding those files from the index since it can&apos;t get to my robots.txt file. But I can&apos;t figure out why Google can&apos;t find the file.&lt;br&gt;
&lt;br&gt;
The only thing I&apos;ve tried today is changing the file permissions on robots.txt to 755, but I doubt that that will fix the problem. I&apos;m still waiting for Google to download and check that file.&lt;br&gt;
&lt;br&gt;
Any ideas here? What can I do to fix this?</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2007:site.70221</guid>
	<pubDate>Mon, 27 Aug 2007 11:53:47 -0800</pubDate>
	<category>google</category>
	<category>microsoft</category>
	<category>robots.txt</category>
	<category>search</category>
	<category>seo</category>
	<category>sitemap</category>
	<category>website</category>
	<dc:creator>JDHarper</dc:creator>
	</item>
	<item>
	<title>Porn sites are spamming my web stats. Make it stop.</title>
	<link>http://ask.metafilter.com/62778/Porn%2Dsites%2Dare%2Dspamming%2Dmy%2Dweb%2Dstats%2DMake%2Dit%2Dstop</link>	
	<description>Porn sites are spamming my web stats. Make it stop. For a few months now, my referral log has been flooded by mildly amusing, but mostly annoying, porn site URLs. These sites are throwing off my data entirely, filling the log with anywhere from 1 to 200 hits each. Almost all of them are over-the-top porn URLs. I want it to stop.&lt;br&gt;
&lt;br&gt;
I know what the phenomenon is, and I&#8217;ve read &lt;a href=&quot;http://ask.metafilter.com/44704/Help-me-stop-referral-log-spam&quot;&gt;all&lt;/a&gt; &lt;a href=&quot;http://ask.metafilter.com/19815/What-are-these-strange-requests-in-my-HTTP-server-log&quot;&gt;the&lt;/a&gt; &lt;a href=&quot;http://ask.metafilter.com/60918/Webstat-Pollution-is-Driving-Me-Insane&quot;&gt;other&lt;/a&gt; AskMe threads about it. I&#8217;m not interested in diagnostics, I want to know the cure. I run WordPress, and I&#8217;m currently running the Bad Behaviour plug-in, and have been for two months, but the problem has remained. Yes, I run Google Analytics as an alternative, but I&#8217;d like to solve the problem within this stats program (Webalizer Version 2.01), too. &lt;br&gt;
&lt;br&gt;
Clear, step-by-step instructions would be most helpful, but any suggestions would be much appreciated. Oh, and you can &lt;a href=&quot;http://kev.elbowroomdesign.com/stats/ref_200705.html&quot;&gt;click here to view my referral log from May&lt;/a&gt;.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2007:site.62778</guid>
	<pubDate>Wed, 16 May 2007 10:53:16 -0800</pubDate>
	<category>badbehavior</category>
	<category>badbehaviour</category>
	<category>googleanalytics</category>
	<category>porn</category>
	<category>referrallog</category>
	<category>referrals</category>
	<category>referrers</category>
	<category>robots.txt</category>
	<category>spam</category>
	<category>statistics</category>
	<category>webalizer</category>
	<category>webstats</category>
	<category>wordpress</category>
	<dc:creator>Milkman Dan</dc:creator>
	</item>
	<item>
	<title>how do i make my blog invisible?</title>
	<link>http://ask.metafilter.com/28728/how%2Ddo%2Di%2Dmake%2Dmy%2Dblog%2Dinvisible</link>	
	<description>how do I make my blog be invisible to search engines? hi,&lt;br&gt;
&lt;br&gt;
i want my blog to not be something people pull up with google and the like. i put up a robots.txt file but it didn&apos;t seem to help. here is it&lt;br&gt;
&lt;br&gt;
# Robots.txt file from http://www.searchengineworld.com&lt;br&gt;
#&lt;br&gt;
# Bans all robots will spider the domain&lt;br&gt;
&lt;br&gt;
User-agent: *&lt;br&gt;
Disallow: /&lt;br&gt;
&lt;br&gt;
any ideas? thanks.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2005:site.28728</guid>
	<pubDate>Thu, 08 Dec 2005 23:43:14 -0800</pubDate>
	<category>blogs</category>
	<category>engines</category>
	<category>robots.txt</category>
	<category>search</category>
	<dc:creator>aussicht</dc:creator>
	</item>
	
	</channel>
</rss>

