Indexing Dynamically Generated Pages
November 19, 2006 5:40 PM   Subscribe

How to make Google Site Search index my dynamically generated website?

I have a website that has content dynamically generated from a DB. To access the content (images with comments and other info) one goes to www.mydomain.com/show_image.php/# where # is the index of the image (currently about 1000 of them). Google has indexed my main index page, and a few other static HTML pages, but it has not indexed any of the dynamically generated pages. I want it to index these pages so I can use Google Site Search to search my site. I currently have a self-made search engine but it doesnt work as well as Googles could.

Any MeFites know how to get Google to index all those dynamically generated pages?
posted by LoopyG to Computers & Internet (11 answers total) 1 user marked this as a favorite
 
sitemaps work for hinting all search engines.
posted by kcm at 5:51 PM on November 19, 2006


You can use Google Site Maps to inform Google of the existance of the pages that should be indexed.
posted by winston at 5:52 PM on November 19, 2006


Or even of their existence
posted by winston at 5:52 PM on November 19, 2006


Dynamically generated shouldn't matter, as long as they're linked from the front page (or from another page that is).
posted by cillit bang at 6:03 PM on November 19, 2006


Response by poster: The dynamically generated pages are not linked from any static pages. I had difficulty understanding Sitemaps.org. I am familiar with programming, but the world of the web is new to me. The Google Site Maps looks interesting, I'll look into that. If anyone has a good tutorial or basic template for a sitemap they could share, that would be greatly appreciated!
posted by LoopyG at 6:23 PM on November 19, 2006


All you need to do is provide some links and wait a few days for Google to re-index your site.
posted by cillit bang at 6:47 PM on November 19, 2006


cillit bang: "All you need to do is provide some links and wait a few days for Google to re-index your site."

Incorrect. Sitemaps are the proper way to do this. With sitemaps, you provide metadata like priority and last updated time that helps your site be indexed more efficiently and properly. There are a few examples on the sitemaps.org site, but, since it's now a standard you can use "Google Sitemaps" or "Yahoo Sitemaps" or whatever you'd like.

I'd suggest using a search engine to find examples like this. :)
posted by kcm at 6:55 PM on November 19, 2006


Response by poster: cillit bang, I read some of those resources, and used the generator in the 2nd link (the one for the word "like"). However, how do I manage the sitemap when people add new content to the databases (and thereby new content), do I have to update the sitemap to reflect those additions, or can I use some kind of command to inform Google to check some range of the value for #?
posted by LoopyG at 7:10 PM on November 19, 2006


As cillit bang said, dynamic vs. static does not matter at all to Google. All Google knows is that it requests a URL from your server and gets a result. But it can only know which URLs to request by finding them in pages that it has already indexed, or being told explicitly by a sitemap. A URL that's not linked from any page on the net (or mentioned in a sitemap) is effectively invisible to search engines. It's the old "if a tree falls in the forest but nobody hears it, does it make a sound" idea, except with URLs.

This assumes that you aren't inadvertently telling Google to go away, with either a robots.txt or a REL=NOINDEX meta tag somewhere.

The converse to this is of course that once it's linked a single time from somewhere it will be in the index, and that somewhere can be deceiving. This is how people find that "secret" areas of their website show up on Google, leading to naïve people sometimes concluding moronic things like "Google must have hacked into our server to find this page." One common example is that some websites have their web stats posted online (e.g. the output of webalizer), and this means that referrers are listed. So if you were viewing http://a.example.com/secret-page.html and clicked on a link to http://b.example.com/public-page.html and the b.example.com admin has public web stats, then there now exists a link to http://a.example.com/secret-page.html on the net, even though it might not have been explicitly put on a page on purpose. So now secret-page.html is in Google's index, and this can cause confusion of the admins of secret-page.html didn't expect this.
posted by Rhomboid at 7:11 PM on November 19, 2006


Yes, you probably want to write a script that dumps your database to a sitemap or set of sitemaps. Those sites were examples of sitemaps, not something you're going to use for a dynamically-generated site.

20-30 lines of Perl would dump a sitemap file header, query the DB, dump each link with the lastmod and priority fields, then dump a sitemap file footer, by my estimate.
posted by kcm at 7:19 PM on November 19, 2006


Response by poster: I know how to update an XML file, as I do this using the RSS feed updater PHP script I wrote. I was hoping to not need to do the same thing for the sitesearch, but it looks like I will. It's a hassle, but do-able.
posted by LoopyG at 7:41 PM on November 19, 2006


« Older Entertain me in KL!   |   Arrrr, matey. Newer »
This thread is closed to new comments.