Website help: I'm asking for some input on (1) straightening out a problem with Google and its "description" for each page of my blog; (2) constructing a robots.txt exclusion for certain types of archive pages; and (3) the reason why something is looking for page URLs that have "function.fsockopen" at the end of them.
(1)
Google seems to be using the first bit of text from my blog — its subtitle and a phrase "Skip to content" which I don't see anywhere — as opposed to what is in the META description tag. (
Results in which I see this.) I prefer it to index the latter, since the META description is an excerpt from the page and thus is better for search engines.
For some pages, it's appropriately indexing the META description. But for many more (probably the majority), it still has the blog's subtitle.
Is this merely a case of the pages with subtitles not having been visited by the Googlebot spider recently? If so, is there anything I can do to get Google to respider the whole site? I'm registered with Google Webmaster Tools. I don't have access to setting a faster crawl rate.
Or is it something wrong with the page's tagging or code? If so, what's wrong with it?
(2)
I have been trying to exclude archive pages from search engines; the post's entire content is reproduced there and Google doesn't like duplication. I do have lt;meta name="googlebot" content="noindex,noarchive,follow,noodp" />
in the archive headers, but I had also tried to exclude it via robots.txt.
Unfortunately, my attempt at doing so ended up excluding a good handful of sites it shouldn't've. My attempt was:
Allow: /200*/*/*/*/
Disallow: /200
The idea was to allow URLs in this format — http://www.
[sitename].com/2008/09/11/blog-post — but to disallow all other posts that began with 200 — which would cover all the archive pages.
Can I just invert the two (put the disallow before the allow) to fix that? Or is there another way to do it? This is
my site's robots.txt file.
(3)
I'm told that Googlebot could not find about 18 pages that were mentioned "either in your Sitemap or by following links from other pages during a discovery crawl." 7 of them are quirks or links I had to fix, but 11 of them were in this format:
http://www.
[sitename].com/2004/10/25/blogger-1025-0648-pm/function.fsockopen
I have absolutely no idea what's causing this. Is it something on my end? These "fsockopen" listings are not in my sitemap (I double-checked). I'm really not even sure where to begin researching this one. I do have this in my htaccess file, if it's a possible cause:
AddType application/x-httpd-php5 .php
AddHandler application/x-httpd-php5 .php
posted by pocams at 12:47 PM on May 30, 2008