Persistent 403 errors on a Wordpress site running Thesis theme
February 4, 2012 3:28 PM   Subscribe

Google's bots are returning 403 errors on all my pages on a Wordpress-based site running the Thesis theme. I'm out of troubleshooting ideas. Halp?

I've been having really terrible search results on my company webpage for a while, and decided to install the Thesis theme for Wordpress 2 days ago. It's supposed to be very good for SEO, and I've been using on my other site with good results for ages.

(The same day I installed Thesis on this site, I had to reinstall WP -- long story, but possibly relevant to this: it seems all the files and folders are correctly permission'd for WP, so I don't think that's the problem).

Today, I tested it from Google's webmaster tool pages, and when Googlebot tries to fetch pages, it returns this error:
HTTP/1.1 403 Forbidden
Date: Sat, 04 Feb 2012 17:51:20 GMT
Server: Apache
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 251
Keep-Alive: timeout=2, max=100
Connection: Keep-Alive
Content-Type: text/html; charset=iso-8859-1

<>

403 Forbidden

Forbidden
You don't have permission to access /about/
on this server.
Additionally, a 403 Forbidden
error was encountered while trying to use an ErrorDocument to handle the request.

* After reading other threads on Thesis' helpsite, I made sure "allow search engines to crawl my site" was set to yes in WP (it already was)

* I doublechecked the Google Analytics tracking code that had been pasted into Thesis. There were some garbage characters, so I pasted it in again, and took the intermediate step of putting it into a text editor and making sure it was plain text. It now comes up fine inside the Thesis settings panel.

* .htaccess doesn't have anything unusual (the first code block is something to allow me to access my Dreamhost stats without WordPress interfering)

RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_URI} ^/(stats|failed_auth\.html).*$ [NC]
RewriteRule . - [L]


# BEGIN WordPress

RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]


# END WordPress
* robots.txt contains only the following
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
It can't crawl the sitemap, either, also gets a 403:

I'm at a loss for what to do/try next. Any ideas? (I also have a call in to Dreamhost tech support in case it's a server-related problem). Thank you!
posted by bitter-girl.com to Computers & Internet (17 answers total)
 
Very odd. Maybe Dreamhost has some issues upstreams?
This Google query is equally perplexing. How long has the site be active? It's unusual that Google hasn't indexed anything at all.
posted by Foci for Analysis at 3:40 PM on February 4, 2012


Generic troubleshooting ideas:

1) is it WP / thesis theme?
- backup
- put a different theme (try again)
- reinstall wordpress (try again)

2) is it google?
- look at stats and see if other spiders were luckier

Come to think about it more this seems to stand out:
Additionally, a 403 Forbidden
error was encountered while trying to use an ErrorDocument to handle the request.


...so even the error documents are access denied? This seems to point at a issue beyond wordpress.
posted by yoHighness at 3:43 PM on February 4, 2012


Response by poster: The site's been live for at least 3 years, Foci for Analysis. (And some of the content on it since 2006, because I imported entries from the business' old site via WP).
posted by bitter-girl.com at 3:45 PM on February 4, 2012


Negative on all the usual malware detectors (Google, Sucuri, Norton safe web, McAfee Site Advisor, WOT).

Bing seems to index the domain, so this seems related to Googlebot. Re-check all of your site settings in webmaster tools - maybe you've blocked access to the site somehow?
posted by Foci for Analysis at 3:49 PM on February 4, 2012


Could it be that your site has been affected by malware and serving Googlebot crappy data? Has WP acted weird in any other way?
posted by Foci for Analysis at 3:53 PM on February 4, 2012


Response by poster: I'm in over my head after checking through the above (.htaccess, robots.txt, etc), Foci for Analysis: I am not seeing anything else I can check or do from inside webmaster tools at Google, unfortunately. It'll tell me the error but it won't tell me anything to do about or it what's causing it.

For example, if you go into the webmaster tools where it says it can't reach the sitemap, it says
We encountered an error while trying to access your Sitemap. Please ensure your Sitemap follows our guidelines and can be accessed at the location you provided and then resubmit.
1
General HTTP error: HTTP 403 error (Forbidden)
HTTP Error: 403
but that's it. Permissions on sitemap.xml are set to 644, for what that's worth...
posted by bitter-girl.com at 3:54 PM on February 4, 2012


How old is that error message?
posted by Foci for Analysis at 3:57 PM on February 4, 2012


By the way, maybe you can try re-fetching one of your pages as Googlebot?
posted by Foci for Analysis at 3:59 PM on February 4, 2012


Response by poster: I re-fetched a few hours ago, and I re-fetched again just now (on the main page) -- same error is coming up on the tools for webmasters fetching-page.
posted by bitter-girl.com at 4:02 PM on February 4, 2012


Strange. It's not Google's user agent string. Just masqueraded as them:
whale:~ artlung$ wget --user-agent="Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" http://www.cooperativepress.com/about/
--2012-02-04 15:59:44--  http://www.cooperativepress.com/about/
Resolving www.cooperativepress.com... 173.236.136.21
Connecting to www.cooperativepress.com|173.236.136.21|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `index.html.1'"
Got your file back clean.

I'm wondering if you have some kind of antispam (maybe Bad Behavior?) plugin or add on on your hosting that has had Google's IP addresses flagged in some way and is now denying them access.
posted by artlung at 4:04 PM on February 4, 2012


Response by poster: Interesting, artlung...

The only plugins currently running on the site are:

Akismet
Disable WordPress Core Update
Disable WordPress Plugin Updates
FeedBurner FeedSmith
Google XML Sitemaps
TweetMeme Retweet Button

No Bad Behavior, etc. I'm on Wordpress 3.3.1
posted by bitter-girl.com at 4:07 PM on February 4, 2012


I would try disabling the plugins one by one and switching over to the default theme just to be sure that none of these are causing any trouble (unlikely, but you never know). Then try fetching a page as Googlebot.

I don't know how related this is, but turns out that Dreamhost will block Googlebot in some specific instances. Maybe...? It could be that they are blocking Googlebot using a more global Apache configuration option than .htaccess.
posted by Foci for Analysis at 4:29 PM on February 4, 2012


Response by poster: Oh THAT'S not good, Foci for Analysis... I'll bring that up when someone from support finally responds. I hope they haven't screwed this up on a global level...thanks for pointing that out.
posted by bitter-girl.com at 4:38 PM on February 4, 2012


Response by poster: (Ok, I just switched to the default WP theme, reloaded, tried to fetch as Googlebot again -- same error)
posted by bitter-girl.com at 4:45 PM on February 4, 2012


If you place a static file on the site, say test.html at the root of the site, see if Google can see that? If it generated the same error then I no longer suspect WordPress at fault. That means webserver, and you can punt this to your hosting company.
posted by artlung at 5:53 PM on February 4, 2012


The problem is the conflict between rewrite rules in htaccess in WordPress and error document generation. See for example this document. Try removing the rewrite rules form your htaccess and see if that works.
posted by TheRaven at 9:18 PM on February 4, 2012


Response by poster: Oh, you guys are going to love this one. Someone (and it wasn't me, friends) put an .htaccess in the root of my user folder (not the website's subfolder, whose .htaccess I had been concerned with) that had "deny from 66.249" in it.

I've got another call over to host tech support to find out if this is something they did (because, again, it sure as hell wasn't me).

Took out "deny from 66.249" and resubmitted via Googlebot and success!

Thank you, everyone!
posted by bitter-girl.com at 6:11 AM on February 6, 2012


« Older How do we get the smell of mothballs out of baby...   |   Beach Blanket Birthday in the SF Bay? Newer »
This thread is closed to new comments.