How to make Google and Microsoft like me again?
August 27, 2007 11:53 AM Subscribe
Why have Google and Microsoft shunned my web site? And why does Google's sitemap control panel think my robots.txt file is unreachable?
A few days ago, I discovered that my website has mostly disappeared from Google, and I am utterly confused as to why. In the past, if you searched for JD Harper, you'd get my personal blog at http://www.jdharper.com/wordpress as the first result. It seems that that page, as well as most of my best posts, have disappeared from the Google index. (I've tried a site search to see what's missing).
Microsoft is also excluding my web site from the search results at live.com, but Yahoo! still has my site at the top. (For now.)
When I logged in to the Google Webmaster Tools, it said that my page was included in the index, but that my sitemap contained errors. The error message that it gives me is as follows:
My robots.txt file is located at http://www.jdharper.com/robots.txt, and it consists entirely of a pointer to my sitemap. I thought that perhaps the crawler had just tried to access robot.txt at a time when the server was down, but it's been several days and it says that it encountered the same error earlier today.
So it looks to me like Google is excluding those files from the index since it can't get to my robots.txt file. But I can't figure out why Google can't find the file.
The only thing I've tried today is changing the file permissions on robots.txt to 755, but I doubt that that will fix the problem. I'm still waiting for Google to download and check that file.
Any ideas here? What can I do to fix this?
A few days ago, I discovered that my website has mostly disappeared from Google, and I am utterly confused as to why. In the past, if you searched for JD Harper, you'd get my personal blog at http://www.jdharper.com/wordpress as the first result. It seems that that page, as well as most of my best posts, have disappeared from the Google index. (I've tried a site search to see what's missing).
Microsoft is also excluding my web site from the search results at live.com, but Yahoo! still has my site at the top. (For now.)
When I logged in to the Google Webmaster Tools, it said that my page was included in the index, but that my sitemap contained errors. The error message that it gives me is as follows:
Network unreachable: robots.txt unreachableUnder Diagnostics, under "Crawl Errors," it lists 133 "Unreachable URLs," all of which say "robots.txt unreachable," with a link to the previous error message.
We encountered an error while trying to access your Sitemap. Please ensure your Sitemap follows our guidelines and can be accessed at the location you provided and then resubmit.
My robots.txt file is located at http://www.jdharper.com/robots.txt, and it consists entirely of a pointer to my sitemap. I thought that perhaps the crawler had just tried to access robot.txt at a time when the server was down, but it's been several days and it says that it encountered the same error earlier today.
So it looks to me like Google is excluding those files from the index since it can't get to my robots.txt file. But I can't figure out why Google can't find the file.
The only thing I've tried today is changing the file permissions on robots.txt to 755, but I doubt that that will fix the problem. I'm still waiting for Google to download and check that file.
Any ideas here? What can I do to fix this?
Response by poster: Actually, before this started, I didn't have a robots.txt file. I added one after I saw the error messages asking for my robots.txt file.
posted by JDHarper at 12:07 PM on August 27, 2007
posted by JDHarper at 12:07 PM on August 27, 2007
Response by poster: Apparently Googlebot has hit my page 11 times in the past month. Compared with 2187 times in April, 2007 in May, and 3 hits in June.
What on earth happened in June?...
posted by JDHarper at 12:16 PM on August 27, 2007
What on earth happened in June?...
posted by JDHarper at 12:16 PM on August 27, 2007
Response by poster: OK, on June 6, I was installing a test blog for Wordpress theme development that I wanted to be at http://testblog.jdharper.com. I discovered that I had been using an htaccess file to redirect users from
http://www.jdharper.com/
to
http://www.jdharper.com/wordpress/
but that this method was also redirecting testblog.jdharper.com to http://www.jdharper.com/testblog/.
I was using the following htaccess rule.
A couple of days ago, when I discovered that I was having this issue, I removed the redirect from the php file. I am right now renaming it from php to html to see if that matters.
posted by JDHarper at 12:25 PM on August 27, 2007
http://www.jdharper.com/
to
http://www.jdharper.com/wordpress/
but that this method was also redirecting testblog.jdharper.com to http://www.jdharper.com/testblog/.
I was using the following htaccess rule.
RedirectMatch temp ^/$ http://www.jdharper.com/wordpress/So I deleted the htaccess file and replaced it with a PHP file that redirected the user to my Wordpress blog.
A couple of days ago, when I discovered that I was having this issue, I removed the redirect from the php file. I am right now renaming it from php to html to see if that matters.
posted by JDHarper at 12:25 PM on August 27, 2007
Hey, you might want to put
posted by Pronoiac at 5:42 PM on August 27, 2007
User-agent: *into robots.txt, so that 'bots know everything is allowed.
Disallow:
posted by Pronoiac at 5:42 PM on August 27, 2007
Response by poster: Well, I'm back in Live.com, and I'm crawling back up the Google rankings. I think it was the redirect on my homepage that was causing the problem. Definitely won't do that again.
posted by JDHarper at 6:35 AM on September 1, 2007
posted by JDHarper at 6:35 AM on September 1, 2007
This thread is closed to new comments.
I'm suspicious of the site-map. Suppose you throw out your robots.txt (HTTP 404), then what happens? Suppose you have robots.txt, yet it's empty, then what? Suppose it's not empty, but doesn't refer to the sitemap, then what?
posted by cmiller at 12:02 PM on August 27, 2007