SubscribeTechnical requirements:
* In order for the Google crawler to correctly gather articles, each page that displays an article's full text needs to have a unique URL that does not change. Google cannot include sites in Google News that display multiple articles at the same URL.
* The URL for each article must contain a unique number consisting of at least three digits.
* Keep in mind that Google cannot include sites for which the URL of the main page includes a date. URLs with dates in them often change on a daily or weekly basis. This prevents Google from crawling the site for new content, as Google is unable to detect the most current URL to be crawled.
* Google's automated crawler is currently best able to crawl regular HTML links. Google is unable to crawl image links or links embedded in JavaScript.
Thank you for your reply. As we mentioned in our previous email, if the only digits in your article URL resemble a year (e.g. "1999" or "2006") our system may not be able to crawl your content.
For example, our news crawler wouldn't crawl articles with the following URLs:
http://www.potsmokinghippieoverlords.org/news/display.html?ID=2006
http://www.potsmokinghippieoverlords.org/news/display.html?ID=yr2006
It would crawl these pages:
http://www.potsmokinghippieoverlords.org/news/display.html?ID=2006/15/04
http://www.potsmokinghippieoverlords.org/news/display.html?ID=yr2006/15/04
Additionally, in order to have your articles crawled by Google News, your article URLs must contain a number consisting of at least three digits. This only applies for the inclusion of your content in Google News.
If you're able to restructure each of your URLs, Google News should begin crawling your content automatically.
Regards,
The Google Team
It appears that our system should be able to crawl URLs with the same formatting as the example you provided. Also, while we do not require a mm/dd stamp for article's URLs, please be aware that we can't guarantee that we will crawl all of the content on a news site.
You are not logged in, either login or create an account to post comments
You can build a sitemap and submit it with Google Analytics to make sure Google is indexing your site correctly; that might be something to investigate, if something is keeping the bots from seeing the new news releases.
posted by luriete at 1:40 PM on July 24, 2006