Why have Google and Microsoft shunned my web site? And why does Google's sitemap control panel think my robots.txt file is unreachable?
A few days ago, I discovered that my website has mostly disappeared from Google, and I am utterly confused as to why. In the past, if you searched for JD Harper, you'd get my personal blog at
http://www.jdharper.com/wordpress as the first result. It seems that that page, as well as most of my best posts, have disappeared from the Google index. (I've tried
a site search to see what's missing).
Microsoft is also excluding my web site from the search results at live.com, but Yahoo! still has my site at the top. (For now.)
When I logged in to the Google Webmaster Tools, it said that my page was included in the index, but that my sitemap contained errors. The error message that it gives me is as follows:
Network unreachable: robots.txt unreachable
We encountered an error while trying to access your Sitemap. Please ensure your Sitemap follows our guidelines and can be accessed at the location you provided and then resubmit.
Under Diagnostics, under "Crawl Errors," it lists 133 "Unreachable URLs," all of which say "robots.txt unreachable," with a link to the previous error message.
My robots.txt file is located at
http://www.jdharper.com/robots.txt, and it consists entirely of a pointer to my
sitemap. I thought that perhaps the crawler had just tried to access robot.txt at a time when the server was down, but it's been several days and it says that it encountered the same error earlier today.
So it looks to me like Google is excluding those files from the index since it can't get to my robots.txt file. But I can't figure out why Google can't find the file.
The only thing I've tried today is changing the file permissions on robots.txt to 755, but I doubt that that will fix the problem. I'm still waiting for Google to download and check that file.
Any ideas here? What can I do to fix this?
I'm suspicious of the site-map. Suppose you throw out your robots.txt (HTTP 404), then what happens? Suppose you have robots.txt, yet it's empty, then what? Suppose it's not empty, but doesn't refer to the sitemap, then what?
posted by cmiller at 12:02 PM on August 27, 2007